Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Order�Statistic Filtering and Smoothing of
Time�Series� Part II
Kenneth E� Barner� and Gonzalo R� Arce�
� Applied Science and Engineering Laboratories�Department of Electrical Engineering
University of DelawareNewark� Delaware �����
��� ��������Email� barner�udel�edu
� Department of Electrical EngineeringUniversity of DelawareNewark� Delaware �����
Abstract
This is the second paper of a two�part tutorial on the fundamentals of univariate time�series �ltering using order�statistics� where both temporal and rank orderings are consideredjointly� This second paper focuses on order�statistic selection �lters� where the �lter outputis restricted to be one of the input samples� In particular� we treat class of Weighted OrderStatistic �WOS� �lters� and the more generalized �lter class of Permutation WeightedOrder Statistic �PWOS� �lters� By combining temporal� and rank�order based weightingwith order�statistic selection� detail and edge preserving �lters that are robust to outliersand sample contamination can be constructed� Like their weighted sum counterparts�these selection �lters can be applied to the smoothing� �ltering� and forecasting of time�series� Furthermore� selection �lters can be optimized as a function of the underlyingsignal statistics� While the weighted sum �lter optimization is formulated under the MeanSquared Error� the selection �lters utilize the more robust Mean Absolute Error �MAE�criteria� This MAE optimization and selection based estimates results in a robust class of�lters that has advantages over the weighted sum counterparts in many applications� Theevolution of these �lters is covered and illustrative examples are given demonstrating theproperties and performance of this class of estimators�
Invited paper to appear in the � Handbook of Statistics��� Order Statistics and Their Applications��
C� R� Rao and N� Balakrishnan� Editors�
BARNER AND ARCE CONTENTS
Contents
� Introduction �
� The Median Filter �
��� The Running Median Filter � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
��� Statistical Properties � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
��� Deterministic Properties � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
��� Median Filtering and Threshold Decomposition � � � � � � � � � � � � � � � � � � ��
� Weighted Median Filters ��
��� Center Weighted Median Filters � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
��� Weighted Median Filters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
��� Weighted Order Statistic Filters � � � � � � � � � � � � � � � � � � � � � � � � � � ��
��� Threshold Decomposition and Logic � � � � � � � � � � � � � � � � � � � � � � � � ��
� Time�Rank Coupling Extensions� PWOS Filters ��
��� L�� PWOS lters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
��� Lj� PWOS lters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
��� PWOS Filter Lattices � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
��� Model Order �Complexity� Reduction � � � � � � � � � � � � � � � � � � � � � � � ��
� Optimization Techniques ��
��� Problem Formulation � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��
��� Algorithm I � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
����� The Steepest Descent and LMS Algorithms � � � � � � � � � � � � � � � � ��
��� Algorithm II � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� Applications to Image Restoration ��
� Conclusion ��
�
BARNER AND ARCE � INTRODUCTION
� Introduction
Data time�series occur naturally in numerous elds of study including economics� engineering�
medical� and many social elds� These time�series must often be processed� or ltered� to
extract some information of interest� Traditionally� this ltering has been linear� Certainly�
linear lters have a sound theoretical basis and have been extensively studied� Unfortunately�
linear lters su�er from poor performance in many applications� Among the signals that linear
lters perform poorly on are those with changing levels and corrupting noise that is either
heavy tailed or signal dependent ���� This poor performance has lead to the investigation of
nonlinear ltering methodologies�
The design of nonlinear lters can follow many approaches since there is no single underlying
theory of nonlinear lters� Thus� nonlinear lters range from simple ad hocmethods designed to
tackle a single problem� to increasingly theoretically founded approaches that are more widely
applicable� One nonlinear ltering approach that has received considerable attention� and for
which much theoretical study has been conducted� is that based on rank�order� Indeed� much
attention has been paid to rank�order lters since the running median lter was rst applied
to the smoothing of time�series by Tukey in ���� ��� �� �� �� �� ���
The rank�ordering of samples allows the design of lter structures that are �a� robust in
environments where the assumed statistics deviate from Gaussian models and are possibly
contaminated with outliers� and �b� track signal discontinuities without introducing transient
or blurring artifacts as linear lters do� Filter classes that operate on rank�order information
can be broadly broken down into two categories according to how the estimate is formed� The
two lter categories are weighted sum and selection type� The weighted sum type lters form
estimates by weighting the input samples� often as a function of temporal� and rank�order� and
then summing the weighted samples to obtain an estimate� Such lters were discussed in Part
I of this paper� The selection type lters take a di�erent approach� restricting the output to
be one of the input samples� As in the weighted sum case� the input samples can be weighted
to re�ect importance� but the lter output must be one of the observation samples�
Selection rank�order lters have advantages over their weighted sum counterparts in many
applications� This is particularly true for signals with numerous edges� such as images or
biomedical signals where the measured process can change states abruptly� Weighted sum
based lters tend to blur the edges of such signals� even if their weights are a function of
temporal� and rank�order� In images� accurate tracking of edges is vital due to the nonlinear
nature of the human visual system�
Selection type lters have considerable advantages in edge tracking as compared to weighted
sum lters� To illustrate this and motivate the selection approach� consider the raster�scan
order ltering of an image corrupted by impulsive type noise� A common approach to limiting
the e�ect of impulsive outliers is through trimming� In a weighted sum approach� this leads to
�
BARNER AND ARCE � INTRODUCTION
Original Signal Noisy Signal Trimming Order Statistic
0 10 20 30 40 50 60 70 80 90 10060
80
100
120
140
160
180
200
220
240
260
Figure �� A single scan line from the image �aerial�� The original� corrupted and runningorder�statistics x��� and x�N����� are shown� The corruption is additive Laplacian noise withprobability of occurance ��� and � � ��� Also� N � �� and � � ��
the ��trimmed mean� The output of this lter at instant n is
y�n� ��
N � ��� � ��N����Xi��
x�i�� ���
where x��� � x��� � � � � � x�N� are the N observed samples in rank order� Thus� the ��trimmedmean averages over all but the �� � smallest and largest samples� If � � �� the sample meanis realized while for � � N � the sample median is realized� A comparable trimming method
that is selection type is the center weighted median ��� which can be expressed as
y�n� � MED�x���� x�n�� x�N������� ���
For this lter� the output is identical to the input as long as x��� � x�n� � x�N������ If x�n� isoutside this range� the output is trimmed to either x��� or x�N������ To compare the weightedsum and selection approaches� consider the single image scan line shown in Fig� �� This
gure shows the original scan line� the scan corrupted by impulsive noise� and the running
trimming statistics x��� and x�N������ As the gure shows� these statistics form a bandbetween which the samples are either averaged �weighted sum approach� or the input is passed
to the output �selection approach�� Figure � shows the results of the two ltering operations�
While both suppress outliers� the selection approach clearly performs better than the weighted
sum approach which excessively smoothes all edges� The advantages of the selection approach
�
BARNER AND ARCE � INTRODUCTION
Original Signal Selection Filter Output Weighted Sum Filter Output
0 10 20 30 40 50 60 70 80 90 10060
80
100
120
140
160
180
200
220
240
Figure �� The output of the selection and weighted sum lters operating on the corrupted scanline in Fig� ��
can more clearly be seen by examining the image in Fig� � whose upper left quarter is the
original �aerial� image� the upper right quarter of the gure is the corresponding quarter
of the image which has been corrupted by noise� the bottom left quarter is the output of
a weighted�sum type order�statistic lter� and the bottom right quarter is the output of a
selection�type order statistic lter� Both lters operate on a raster scan and have a width of
seven�
This example illustrates that the selection approach to ltering has clear advantages for
certain signals� It is this general category of selection order�statistic lters that we cover in
this paper� We begin by giving a brief review of the most well known and thoroughly studied
selection order�statistic lter� the median lter� The median lter is also the starting point for
many generalizations that have been developed� Therefore� a thorough understanding of the
median lter is necessary to fully understand the principals behind the generalizations�
The median lter� as will be shown� possesses many optimality properties� However� the
lter o�ers little �exibility and is temporal blind� That is� all temporal information is lost
in the ltering process� Permuting the time ordered observations� for instance� does not alter
the lter output� This lack of temporal information causes performance to su�er� As a result�
numerous generalizations of the median lter have been introduced that incorporate some form
of temporal information ��� �� � �� �� ����
Temporal information can be incorporated into order�statistic ltering through weighting
of time ordered samples prior to rank ordering� This leads to the class of Weighted Median
lters and Weighted Order Statistic �WOS� lters ����� Through weighting� certain temporal
�
BARNER AND ARCE � INTRODUCTION
Figure �� The image �aerial� broken into four quadrants� upper left� original� upper right�noisy� lower left� weighted sum lter output� lower right� selection output�
samples can be emphasized while others are deemphasized� This weighting scheme incorporates
temporal information and results in considerable performance gain over temporal blind �strictly
rank�order� lters� Still� the temporal�order weighting followed by rank ordering decouples the
temporal from rank information during weighting� Due to this decoupling� these lters use
only a fraction of the temporal and rank information contained in the two orderings�
The full temporal and rank information is represented by the mapping that takes one or�
dering to the other� p � x �� xL where x and xL are vectors containing the temporally andrank ordered observation samples� respectively� The full permutation mapping information
can be utilized by coupling the temporal� and rank�order during weighting� This results in the
powerful class of Permutation Weighted Order�Statistic �PWOS� lters ���� ���� While the per�
formance achieved by using the full permutation information can be impressive� the explosive
growth in the parameter set limits the number of samples for which the full information can
be used� To combat this problem� a Lj� lattice approach to coupling temporal� and rank�order
information is used� In the lattice terminology� � and L refer to temporal� and rank�order
respectively� The exponents govern the amount of rank �j� coupling used� Thus� the amount
of temporal and rank coupling is easily controlled� This o�ers �exibility in performance as
well as control over the parameter set� In addition to the lattice approach� we detail alter�
native methods for reducing the permutation information while retaining performance gains�
To e�ectively utilize these classes of lters� the parameters must be set appropriately for the
�
BARNER AND ARCE � THE MEDIAN FILTER
task at hand� To this end� we present two adaptive optimization techniques� Lastly� numerous
examples are given illustrating the performance of the various lters�
� The Median Filter
The running median lter was the genesis for the broad array of rank order based ltering
techniques that exist today� and that continue to be developed� The running median lter
was rst suggested as a nonlinear smoother for time series data by Tukey in ���� ���� Since
median lters are the foundation upon which current rank order based ltering techniques
are based� a thorough understanding of the median lter and its properties is crucial to the
development and understanding of current techniques� As such� a brief review of the median
lter is given in this section� The review includes formal denitions and a survey of statistical
and deterministic properties developed to characterize the median lter performance� Also
included is a review of threshold decomposition� which was instrumental in developing many
of the median lter properties� This review serves as a starting point for the median lter
generalizations developed in the following sections�
��� The Running Median Filter
To dene the running median lter� let fxg be a discrete time sequence� The running medianpasses a window over the sequence fxg that selects� at each instant n� an odd number of samplesto comprise the observation vector x�n�� The observation window is typically symmetric and
centered at n� resulting in
x�n� � �x�n�N��� � � � � x�n�� � � � � x�n�N���T � ���
where N� may range in value over the nonnegative integers and N � �N� � � is the �odd
valued� window size� While processing such non�causal observation vectors has traditionally
been referred to as smoothing� we loosen the terminology somewhat and refer to the processing
of both causal and non�causal observations as simply ltering� The median lter operating on
the input sequence fxg produces the output sequence fyg� where at time index n
y�n� � MED�x�n�� ���
� Median value of �x�n�N��� � � � � x�n�� � � � � x�n�N���� ���
That is� the samples in the observation window are sorted and the middle� or median� value is
taken as the output�
The input sequence fxg may be either nite or innite in extent� For the nite case� thesamples of fxg can be indexed as x���� x���� � � � � x�L�� where L is the length of the sequence�Due to the symmetric nature of the observation window� the window extends beyond a nite
extent input sequence at both the beginning and end� These end e�ects are generally accounted
�
BARNER AND ARCE � THE MEDIAN FILTER
0
1
2
3
4
5
0
1
2
3
4
5
Input
Output
Filter Motion
Figure �� The operation of the window width � median lter� �� appended points�
for by appending N� samples at the beginning and end of fxg� Although the appended samplescan be arbitrarily chosen� typically these are selected so that the points appended at the
beginning of the sequence have the same value as the rst signal point� and the points appended
at the end of the sequence all have the value of the last signal point�
To illustrate the appending of input sequence and the median ltering operation� consider
the input signal fxg of Figure �� In this example� fxg consists of � observations from a��level process� fx � x�n� � f� �� � � � � �g� n � �� �� � � � � �g� The gure shows the input se�quence and the resulting output sequence for a window size � median lter� Note that to
account for edge e�ects� two samples have been appended to both the beginning and end
of the sequence� The median lter output at the window location shown in the gure is
y��� � MED�x���� x��� x���� x���� x����� � MED� �� �� �� �� � � � ��
The median ltering operation is clearly nonlinear� As such� the median lter does not
possess the superposition property� Thus� traditional frequency and impulse response analysis
are not applicable� The impulse response of a median lter is� in fact� zero for all time�
Consequently� alternative methods for analyzing and characterizing median lters must be
employed� Broadly speaking� two types of analysis have been applied to the characterization
of median lters� statistical and deterministic� Statistical properties examine the performance
of the median lter� through such measures as optimality and output variance� for the case
of white noise time sequences� Conversely� deterministic properties examine the lter output
characteristics for specic types of commonly occurring deterministic time sequences� In the
following� we review some of the statistical and deterministic properties of running median
lters�
�
BARNER AND ARCE � THE MEDIAN FILTER
��� Statistical Properties
The statistical properties of median lters can be examined through the derivation of output
distributions and statistical conditions on the optimality of median estimates� These analysis
generally assume that the input to the median lter is a constant signal with additive white
noise� The assumption that the noise is additive and white is quite natural and made similarly
in the analysis of linear lters� The assumption that the underlying signal is a constant is
certainly convenient� but more importantly� often valid� This is especially true for the types of
signals median lters are most frequently applied to� such as images� Signals such as images
are characterized by regions of constant value separated by sharp transitions� or edges� Thus�
the statistical analysis of a constant region is valid for large portions of these commonly used
signals� By calculating the output distribution of the median lter over a constant region� the
noise smoothing capabilities of the median can be measured through statistics such as the lter
output variance�
The median lter properties covered here are for time series signals consisting of white noise
observation samples with known distribution� Since the observation sequence is probabilistic�
the time index can be dropped and attention focused on a single observation vector� In this
case� and others for which the time index n can be dropped without confusion� we do so and
denote the observation vector as simply x � �x�� x�� � � � � xN �� Consider rst the case where the
observation samples are white noise with a double exponential� or Laplacian� distribution� In
this case� the common probability density function �pdf� is given by fx�t� ��p���
e�p��jt��j�
where � and �� are the mean and variance� respectively� For a vector of samples� the joint pdf
is
fx�t� �
��p���
�Ne�
p��
PNi��
jti��j� ���
Given an observation vector x� the Maximum Likelihood �ML� estimate of the mean� or location
parameter� is found by maximizing ��� with t � x� To simplify the notation� dene the distance
operator D���� asD���� �
NXi��
jxi � �j� � ���
Then the ML estimate of the location� for Laplacian distributed samples� is the value � that
minimizes D���� with � � �� It is easy to show that
MED�x�� x�� � � � � xN � � argmin�
D����� ��
Thus� the median of the samples x�� x�� � � � � xN is the value � that minimizes D����� and
consequently� the ML estimate of location for samples with a Laplacian distribution� As a
comparison�
MEAN �x�� x�� � � � � xN � ��
N
NXi��
xi � argmin�
D���� ���
�
BARNER AND ARCE � THE MEDIAN FILTER
is the ML estimate of location for samples with a Gaussian distribution� The median and sam�
ple mean are� thus� optimal estimates of location for the Laplacian and Gaussian distributions�
respectively� This shows that for heavy tailed distributions� such as the Laplacian� the median
has advantages over the linear combination based sample mean� A further examination of
D���� and D���� reinforces this point� The median is clearly the least absolute error estimateof the center of the distribution for x�� x�� � � � � xN � while the mean is the least squared error
estimator� The reliance on the absolute error criteria means that the median is less in�uenced
by outliers than the squared error based mean�
Having established the types of signals for which median lters are optimal� the ltering
operation can be further characterized through the determination of output distributions�
Assume again that the input time series consists of white noise samples with pdf fx��� andcumulative distribution �cdf� Fx���� Under these conditions on the input samples� it well knownthat the median lter output cdf� Fmed���� and pdf� fmed���� are given by
Fmed�t� �NX
i�N���
�Ni
�Fx�t�
i��� Fx�t��N�i ���
and
fmed�t� �N �
N��N��fx�t�Fx�t�
N���� Fx�t��N� ����
respectively ����� From these expressions it can be shown that for t� and t� such that Fx�t�� �
� � Fx�t��� then Fmed�t�� � � � Fmed�t�� also holds� By setting t� � t���� where by denitiont��� is the point satisfying Fx�t���� � ��� we see that the median is statistically unbiased in the
sense that the median of the input is the median of the output� Moreover� the median behaves
consistently for samples with asymmetric distributions�
The calculation of statistics such as the output mean and variance from the expressions
in ��� and ���� is often quite di�cult� Insight into the smoothing characteristics of the
median lter can� however� be gained by examining the asymptotic behavior �N � �� ofthese statistics� where� under some general assumptions� results can be derived� For the case
of white noise input samples� the asymptotic mean� �med� and variance� ��med� of the median
lter output are
�med � t���� ����
and
��med ��
�N�fx�t������� ����
Thus� the median produces a consistent �limN�� � � and unbiased estimate of the inputdistribution median� irrespective of the input distribution� Note that the output variance is
not proportional to the input variance� but rather ��fx�t����� For heavy tailed noise such as
impulsive� ��fx�t���� is not related to input variance� i�e�� the variance is proportional to the
impulse magnitude� not ��fx�t����� Thus� the output variance of the median in this case is not
BARNER AND ARCE � THE MEDIAN FILTER
Mean and Median Filter Output Variance
Input Sample Probability Filter TypeDensity Function Mean Median
Uniformfx�t� �
��p����
for�p��� � t �
p���
� otherwise��
N���
N��
Gaussianfx�t� �
�p����
e� �
����t���� ��
N���
�N
Laplacianfx�t� �
�p���
e�p�
�jt��j ��
N��
�N
Table �� Asymptotic output variences for the window size N mean and median lters for whiteinput samples with uniform� Gaussian� and Laplacian distributions�
proportional to the input variance� This is not true for the sample mean and further explains
the more robust behavior of the median�
The variances for the sample mean and median lter output are given in Table � for the
uniform� Gaussian� and Laplacian input distribution cases ����� The results hold for all N in
the uniform case and are asymptotic for the Gaussian and Laplacian cases� Note that the
median performs about � dB better than the sample mean for the Laplacian case and � dB
worse in the Gaussian case�
The median lter possesses numerous statistical properties in addition to those discussed
above� Among those properties that illustrate the optimality of the median are ����
�� The conditional median at each time instant n is the minimum Mean Absolute Error
�MAE� estimator of the signal value at time n� where the conditioning is on the past
history up to time n of the noise corrupted observations of the signal�
�� The running median is� with high probability� a maximum a posteriori estimator of a
constant signal in symmetric impulsive noise�
These statistical properties are complemented by a set of deterministic properties� which are
discussed next�
��� Deterministic Properties
Statistical properties give considerable insight into the median lter performance� The median
lter cannot� however� be su�ciently characterized through statistical properties alone� For
instance� an important question not answered by the statistical properties is what type of
signal� if any� is passed through the median lter unaltered� Linear lters� for example� can
�
BARNER AND ARCE � THE MEDIAN FILTER
be analyzed in the frequency domain to determine� among other things� pass� and stop�band
frequencies� If the frequency content of the input signal lies exclusively in the lter passband�
then the signal passes through the lter unaltered�� Conversely� signal content in the stop band
does not pass through� or is at least attenuated by� the lter� Somewhat analogous results do
in fact exist for the median lter� For median lters� passband or invariant signals are referred
to as root signals� The concept of root signals is important to the understanding of median
lters and their e�ect on general signal structures� A review of the signicant results in root
signal analysis is given in the following along with the main median lter properties resulting
from this analysis�
The denition of a root signal is quite simple� a signal is a median lter root if the signal
is invariant under the median ltering operation� Thus� a signal fxg is a root of the windowsize N � �N� � � median lter if
x�n� � MED�x�n�N��� � � � � x�n�� � � � � x�n�N��� ����
for all n� As an example� consider the signal shown in Fig� �� This signal is ltered by three
di�erent window size median lters �N� � �� �� and ��� Note that for the window size three
case �N� � ��� the lter output is a root� That is� further ltering of this signal with the
window size three median lter does not alter the signal� Notice� however� that if this same
signal is ltered with a larger window size median� the signal will be modied� Thus� the signal
in Fig� ��b� is in the passband� or a root� of a N� � � median lter but outside the passband�
or not a root� of the N� � � and N� � � lters�
The goal of root analysis is to relate the ltering of desired signals corrupted by noise to
root and non�root signals� If it can be shown that certain types of desired signals are in the
median lter root set� while noise is outside the root set� then the median ltering of a time
series will preserve desired structures while altering the noise� Such a result does in fact hold
and will be made clear through the following denitions and properties� First note that� as
the example above illustrates� whether or not a signal is a median lter root depends on the
window size of the lter in question� Clearly� all signals are roots of the window size one median
�identity� lter� To investigate this dependence on window size� median lter root signals can
be characterized in terms of local signal structures� where the local signal structures are related
to the lter window size� Such a local structure based analysis serves two purposes� First�
it denes signal structures that� when properly combined� form the median lter root set�
Second� by relating the local structures to the lter window size� the e�ect of window size on
roots is made clear� The local structure analysis of median lter roots relies on the following
denitions�
Denition ��� A constant neighborhood is a region of at least N��� consecutive identically
�In general� the pass�band is de�ned in terms of the magnitude response� Thus� there may be some timeshifting of signals in the pass�band� depending on the �lter phase response�
�
BARNER AND ARCE � THE MEDIAN FILTER
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
Output signal for a window
of size 3
Input signal x(n)
0
1
2
3
4
Output signal for a window
Output signal for a window
of size 5
of size 7
Figure �� E�ects of window size on a median ltered signal� �� appended points�
valued points� �
Denition ��� An edge is a monotonic region between two constant neighborhoods of dif�
ferent value� The connecting monotonic region cannot contain any constant neighborhoods�
�
Denition ��� An impulse is a constant neighborhood followed by at least one� but no more
than N� points which are then followed by another constant neighborhood having the same
value as the rst constant neighborhood� The two boundary points of these at most N points
do not have the same value as the two constant neighborhoods� �
Denition ��� An oscillation is a sequence of points which is not part of a constant neigh�
borhood� an edge or an impulse� �
These denitions may now be used to develop a description of those signals which do and
those which do not pass through a median lter without being perturbed� In particular� Gal�
lagher and Wise ���� have developed a number of properties which characterize these signal
sets for the case of nite length sequences� First� any impulse will be eliminated upon me�
dian ltering� Secondly� a nite length signal is a median lter root if it consists of constant
��
BARNER AND ARCE � THE MEDIAN FILTER
neighborhoods and edges only� Thus� if a desired signal is constructed solely of constant neigh�
borhoods and edges� then it will not be altered by the median ltering operation� Conversely�
if observation noise consists of impulses �as dened above�� it will be removed by the median
ltering operation� These median lter root properties are made exact by the following�
Denition ��� A sequence fxg is said to be locally monotonic of lengthm� denoted LOMO�m��if the subsequence x�n�� x�n� ��� � � � � x�n�m� �� is monotonic for all n �� �
Property ��� Given a length L sequence to be median ltered with a length N � �N� � �
window� a necessary and su�cient condition for the signal to be invariant �a root� under median
ltering is that the extended �beginning and end appended� signal be LOMO�N� � ��� �
Thus� the set of signals that forms the passband or root set �invariant to ltering� of a size
N median lter consists solely of those signals that are formed of constant neighborhoods and
edges� Note that by the denition of LOMO�m�� a change of trend implies that the sequence
must stay constant for at least m� � points� It follows that for a median lter root signal tocontain both increasing and decreasing regions� these regions must be separated by a constant
neighborhood of least N� � � identically valued samples� It is also clear from the denition
of LOMO��� that a LOMO�m�� sequence is also LOMO�m�� for any two positive integersm� � m�� This implies that the roots for decreasing window size median lters are nested�i�e�� every root of a window size M lter is also a root of a window sized N median lter for
all N M � This is formalized by�
Property ��� Let S denote a set of nite length sequences and RN� be the root set of the
window size N � �N� � � median lter operating on S� Then the root sets are nested such
that � � �RN��� � RN� � RN��� � � � � � R� � R� � S� �
In addition to the above description of the root signal set for a median lter� it can be
shown that any signal of nite length is mapped to a root signal by repeated median ltering�
In fact� it is simple to show that the rst and last points to change value on a median ltering
operation remain invariant upon additional lter passes� where repeated lter passes consist
of using the output of the prior lter pass for the input of an identical lter on the current
pass� This fact� in turn� indicates that any L long nonroot signal �oscillations and impulses�
will become a root structure after a maximum of �L � ���� successive lterings� This simplebound was improved in ���� where it was shown that at most
�
�L� �
��N� � ��
�����
passes of the median lter are required to reach a root� This bound is conservative in practice
since in most cases a root signal is obtained after ten or so lter passes�
��
BARNER AND ARCE � THE MEDIAN FILTER
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
Input signal x(n)
0
1
2
3
4
Root signal for a window
Root signal for a window
of size 5 ( 2 filter passes).
of size 3 ( 1 filter pass).
of size 7 ( 2 filter passes).
Root signal for a window
Figure �� Root signals obtained by median lters of size �� �� and �� �� appended points�
The median lter root properties are illustrated through an example in Fig� �� This gure
shows an original signal and the resultant root signals after multiple passes of window size ��
�� and � median lters� Note that while it takes only a single pass of the window size � median
lter to obtain a root� it takes two passes for the window sizes � and � median lters� Clearly�
the locally monotonic structure requirements of the root signals are satised in Fig� �� For
the window size � case� the input sequence becomes LOMO��� after a single pass of the lter�
Thus� this sequence is in the root set of the window size � median lter� but not a root of the
window size N � median lter since it is not LOMO�N� for N ��
The deterministic and statistical properties form a powerful set of tools for describing the
median ltering operation and performance� Together� they show that the median lter is
an optimal estimator of location for Laplacian noise and that common signal structures� e�g��
constant neighborhoods and edges in images� are in the lter pass�band �root set�� Moreover�
impulses are removed by the ltering operation and repeated passes of the median lter always
results in the signal converging to a root� where the root consists of a well dened set of
structures related to the lter window size�
��
BARNER AND ARCE � THE MEDIAN FILTER
��� Median Filtering and Threshold Decomposition
A fundamental property of median lters is threshold decomposition ���� This property was
the key to deriving many of the median lter statistical and deterministic properties� Moreover�
threshold decomposition is instrumental in the optimization of the median lter generalizations
discussed in the following sections� A review of this important property is therefore in order�
Threshold decomposition is simply a means of decomposing anM �level signal into an equiv�
alent set of M � � binary sequences�� Let x�n� � �x�� x�� � � � � xN � be an N element observationvector where the signal is quantized toM levels such that x�n� � ZM � f� �� � � � �M��g� Thethreshold decomposition of x�n� results in the set of binary vectorsX��n��X��n�� � � � �XM���n��where Xi�n� � f� �gN is the observation vector thresholded at level i for i � �� �� � � � �M � ��As a function of the threshold operator Ti����
Xi�n� � Ti�x�n�� ����
� �Ti�x��� Ti�x��� � � � � Ti�xN �� ����
�hXi��X
i�� � � � � X
iN
i� ���
where Ti��� is dened asXij � Ti�xj � �
�� if xj � i
otherwise
� ����
for i � �� �� � � � �M � � and j � �� �� � � � � N � In terms of the time indexed samples� Xi�n� �Ti�x�n��� Threshold decomposition can be reversed by simply adding the threshold decomposed
signals�
x�n� �M��Xi��
Xi�n� and x�n� �M��Xi��
Xi�n�� ���
Furthermore� it was shown by Fitch et� al� that the median operation commutes with thresh�
olding ����� Stated more formally� the median ltering of aM�level signal x�n� � f� �� � � � �M��g is equivalent to ltering the M � � threshold signals X��n��X��n�� � � � �XM���n�� and sum�ming the results�
MED�x�n�� �M��Xi��
MED�Xi�n�� ����
for all n� Thus� threshold decomposition is a weak superposition property� A related property
is the partial ordering property known as the stacking property�
Denition ��� Let X and Y be N element binary vectors� Then X stacks on Y� which is
denoted as Y � X� if and only if Yi � Xi for i � �� �� � � � � N � A function f��� possesses thestacking property if and only if
Y � X f�Y� � f�X�� �����For now we restrict the discussion to quantized signals� This restriction is lifted in Section ���
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
1 01 2 0 3 3 1 2 2
0 0 0
001 1 1
0 00 0 0
1 1 0
1 1
1 1 0 0 0
0 1
1 1
1 1
1
Binary Med. Filter
Binary Med. Filter
Binary Med. Filter
Threshold at 1, 2, and 3 Add binary outputs
Median Filter 1 1 3 3 2 21 0 2 2
0 0
1 1
0 00 0 0
1 1
1 1
1 1 0 0 0
0
1 1
1 1
11 10
0 1 1
Figure �� Median Filtering by threshold decomposition� The ��valued input signal is lteredby the running sorting method in the top part of the gure� In the bottom part of the gure�the signal is rst decomposed into a set of binary signals and each of these is ltered by abinary median lter� The output is produced by adding together the outputs of the binarymedian lters�
�
The median lter was shown to possesses the stacking property ����� which can be stated as
follows� In the threshold decomposition domain� the binary median lter output at threshold
level i is always less than or equal to the binary median lter output at lower threshold levels�
MED�Xi�n�� � MED�Xj�n�� ����
for all i� j such that � � j � i �M � ��The stacking property is a partial ordering property� It states that the result of applying the
median lter to each of the binary sequences obtained by thresholding the original signal will
have a specic structure to them� Thus� in median ltering by threshold decomposition� the
input sequence is rst decomposed inM�� binary sequences� and each of these is then lteredby a binary median lter� Furthermore� the set of output sequences possesses the stacking
property� As a simple example� consider the median lter of window size three �N � ��
being applied to a ��level input signal as shown in Fig� �� The outputs of the multi�level
median lter and of the threshold decomposition median lter are identical because of the
weak superposition property�
� Weighted Median Filters
Numerous generalizations to the median ltering operation have been introduced since Tukey
rst suggested the median lter as a smoother in ���� ���� While many di�erent approaches
have been taken in an attempt to improve the median lter performance� most have� in some
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
way� attempted to include temporal information into the ltering process� For most signals�
and certainly those of practical interest� it is clear that certain observation samples have a
higher degree of correlation with the desired estimate than do others� In the linear lter case�
this correlation is re�ected in the weight given each sample� A similar weighting approach can
be taken to generalize the median lter�
The sample weighting approach to generalizing the median lter is developed in this section�
We begin by discussing the Center Weighted Median �CWM� lter� in which only one sample�
the sample centrally located in the observation window� is weighted� This is then generalized
to the Weighted Median �WM� lter case in which all observation samples are weighted� In
both the CWM and WM lter cases the output is the median value of the weighted set� A
further generalization can be achieved by allowing the output to be an order statistic other than
the median� This leads to the class of Weighted Order Statistic �WOS� lters� Following the
development of these generalizations� we show that each possesses the threshold decomposition
property� As noted earlier� threshold decomposition is an extremely powerful tool for both lter
analysis and optimization� and is the nal topic covered in this section�
��� Center Weighted Median Filters
The median lter is strictly a rank order operator� Thus� all temporal locations within the
observation window are considered equivalent� That is� given a window of observation samples�
any permutation of the samples within the observation window results in an identical median
lter output� As stated above� for most signals certain samples within the observation window
are more correlated with the desired estimate than are others� Due to the symmetric nature of
the observation window� the sample most correlated with the desired estimate is� in general�
the center observation sample�
The center observation sample can be weighted to re�ect its importance� or correlation
with the desired estimate� Since median lters select the output in a di�erent fashion than
do linear lters� i�e�� ranking versus summing� the observation samples must also be weighted
di�erently� In the median ltering case� weighting is accomplished through repetition� Thus�
the output of the CWM lter is given by
y�n� � MED�x�� � � � � xc��� xc wc� xc��� � � � � xN �� ����
where xcwc denotes the replication operator xcwc �wc timesz ��
xc� xc� � � � � xc and c � �N����� � N���
is the index of the center sample� The center sample is thus repeated wc times� where wc is
non�zero odd positive integer� Consequently� the output of the CWM lter is the median over
an extended set containing multiple center samples� When wc � �� the operator is a median
lter� and for wc � N � the CWM reduces to an identity operation� On the right side of ���� thetime index n has been dropped for notational simplicity and the observation samples indexed
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
0 50 100 150 200 250 300 350 400 450 500-1
0
1
2
3
4
5
time n
wei
ght w
Figure � E�ects of increasing the center weight of a CWM lter of size N � � operating onthe voiced speech �a�� The CWM lter output is shown for wc � �� �� �� �� �� Note that forwc � � the CWM reduces to median lter� and for wc � � it becomes the identity lter�
according to their location in the observation window� In terms of the time series� the samples
in the observation window are xi � x�n� �N� � �� � i� for i � �� �� � � � � N �The e�ect of varying the center sample weight is perhaps best seen by way of an example�
Consider a segment of recorded speech� The voiced waveform �a� is shown at the top of Fig� �
This speech signal is taken as the input of a CWM lter of size �� The outputs of the CWM�
as the weight parameter wc from � to �� are also shown in Fig� � The vertical index denotes
the value given to wc� The signal at the top is the original signal� or the output signal of the
CMW when wc � N � or � in this example� The second signal from the top is the CWM ltered
signal with wc � N � �� The weight wc is successively decreased until wc � �� in which casethe CWM lter reduces to the standard median�
The smoothing characteristics of the CWM lter� as a function of the center sample weight�
are illustrated in the previous example and gure� Clearly� as wc is increased less smoothing
occurs� This response of the CWM lter is explained by the following property which relates
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
x(k ) x(N+1-k)x(1) x(N)
Figure �� The center weighted median ltering operation� The center observation sample ismapped to the order statistic x�k� �x�N���k�� if the center sample is less �greater� than x�k��x�N���k��� and left unaltered otherwise�
the weight wc and the CWM lter output to select order statistics �OS�� The N observation
samples x�� x�� � � � � xN can be written as an OS vector�
xL � �x���� x���� � � � � x�N��� ����
where x��� � x��� � � � � � x�N�� The following relation ��� �� utilizes this notation�
Property ��� Let fyg be the output of a CWM lter operating on the sequence fxg� Then
y�n� � MED�x�� � � � � xc��� xc wc� xc��� � � � � xN �� ����� MED
hx�k�� xc� x�N�k���
i����
where k � �N � �� wc��� for � � wc � N � and k � � for wc N � �
From this property we can write the CWM lter output y�n� as
y�n� �
���
xc if x�k� � xc � x�N���k�x�k� if xc � x�k�x�N���k� if xc � x�N���k�
� ���
Since x�n� is the center sample in the observation window� i�e�� xc � x�n�� equation ���
indicates that the output of the lter is identical to the input as long as the x�n� lies in the
intervalhx�k�� x�N���k�
i� If the center input sample is greater than x�N���k� the lter outputs
x�N���k�� guarding against a high rank order �large� aberrant data point being taken as theoutput� Similarly� the lter�s output is x�k� if the sample x�n� is smaller than this order
statistic� This CWM lter performance characteristic is illustrated in Figs� � and �� Figure �
shows how the input sample is left unaltered if it is between the trimming statistics x�k� and
x�N���k� and mapped to one of these statistics if it is outside this range� Figure � showsan example of the CWM lter operating on a Laplacian sequence� Along with the input and
output� the trimming statistics are shown� It is easily seen how increasing k tightens the range
in which the input is passed directly to the output�
��� Weighted Median Filters
The weighting scheme used by CWM lters can be naturally extended to include all input
samples� To this end� let w � �w�� w�� � � � � wN � be a N long weight vector with positive integer
�
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
Input Signal Trimming Order StatisticsFilter Output
0 20 40 60 80 100 120 140 160 180 200−4
−3
−2
−1
0
1
2
3
4
Figure �� An example of the CWM lter operating on an i�i�d� Laplacian sequence with unitvariance� Shown are the lter input and output sequences as well as the trimming statisticsx�k� and x�N���k�� The lter window size is �� and k � ��
elements that sum to an odd number� i�e��PN
i�� wi is odd� Given this vector of weights� the
WM lter operation is dened as ��
y�n� � MED�x�n� w� ����� MED�x� w�� x� w�� � � � � xN wN �� ���
Thus� WM lters incorporate temporal order information by weighting samples according to
their temporal order prior to rank ltering� The ltering operation is illustrated through the
following example�
Example ��� Consider the window size � WM lter dened by the symmetric weight vector
w � ��� �� �� �� ��� For the observation x�n� � ���� �� �� �� ��� the lter output is found as
y�n� � MED� �� �� � �� � �� � �� � � �� MED� ��� �� �� �� �� �� �� �� � �� MED� �� �� �� �� �� �� �� �� �� �� �
����
where the median value is underlined in equation ����� The large weighting on the center
input sample results in this sample being taken as the output� As a comparison� the standard
median output for the given input is y�n� � �� �
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
w1 w2 w3 w4 wN
{x} Observation Window
x1 x2 x3 x4 xN
MED
{y}
. . .
. . .
Input
Output
Figure ��� The weighted median ltering operation�
The WM ltering operation can be schematically described as in Fig� ��� This gure il�
lustrates that as the lter window slides over an input sequence� the observation samples are
duplicated �weighted� according to their temporal order within the window� This replication
forms an expanded observation set which is then ordered according to rank� and the median
sample selected as the output� In this fashion specic temporal order samples can be empha�
sized� and others de�emphasized� The gure also illustrates that structurally� the WM lter is
similar to the linear FIR lter� This relationship between linear and WM lters can be further
explored through an alternative WM lter denition�
The constraint that the WM lter weights be integer valued can be relaxed through a
second� equivalent� lter denition� Thus� let w be an N element weight vector with positive
�possibly� non�integer elements� The output of the WM lter dened by w and operating on
the observation x�n� can be dened as
y�n� � argmin�
D�w��� ����
where D�w��� is the weighted distance operator
D�w��� �NXi��
wijxi � �j�� ����
Note thatD�w��� is piecewise linear and convex for wi � � i � �� �� � � � � N � Thus� argmin�D�w���is guaranteed to be one of the input samples x�� x�� � � � � xN � The WM lter output for non�
integer weights can determined from ���� as follows�
�� Calculate the threshold w� ���
PNi�� wi�
�� Sort the samples in the observation vector x�n��
�� Sum the weights corresponding to the sorted samples beginning with the maximum
sample and continuing down in order�
�
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
�� The output is the sample whose weight causes the sum to become � w��
The following example illustrates this procedure�
Example ��� Consider the window size � WM lter dened by the real valued weights
w � ���� ��� ��� ��� ���� The output for this lter operating on the observation x�n� �
���� �� �� �� �� is found as follows� Summing the weights gives the threshold w� ���
P�i�� wi �
���� The observation samples� sorted observation samples� their corresponding weight� and
the partial sum of weights �from each ordered sample to the maximum� are�
observation samples ��� �� �� �� �corresponding weights ��� ��� ��� ��� ��
sorted observation samples �� �� �� �� ��corresponding weights ��� ��� ��� ��� ��partial weight sums ��� ��� ��� ��� ��
����
Thus� the output is � since when starting from the right �maximum sample� and summing the
weights� the threshold w� � ��� is not reached until the weight associated with � is added�
The underlined sum value above indicates that this is the rst sum which meets or exceeds the
threshold� �
In the previous section the median and sample mean lters were related through the
distance operator D����� There� it was shown that MED�x�n�� � argmin� D���� whileMEAN�x�n�� � argmin�D
����� Similar results hold relating the WM and linear FIR l�
ters by means of the weighted distance measure D�w���� As stated in ����� the WM of x�n� isargmin�D
�w��� for � � �� Interestingly� if the distance norm is changed to two� then
argmin�
D�w��� �
PNi��wixiPNi�� wi
� ����
which is a normalized linear FIR lter �����
Before ending the discussion on WM lters it is important to point out that the two lter
denitions given �equations ��� and ����� are identical� It has been shown that any WM lter
based on real valued weights has an equivalent integer valued weight representation ����� As
an illustration� multiplying a weight vector by a positive constant results in an identical lter�
Thus� the WM lter dened by the weight vector w � � �� �� �� �� � � is identical to that used in
Example ���� Consequently� there are only a nite number of WM lters for a given window
size� The number of WM lters� however� grows rapidly with window size� For instance� there
are only � window size � WM lters� but ��� and ������ window size � and � WM lters�
respectively ����
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
��� Weighted Order Statistic Filters
The weighting scheme used in WM lters is an e�ective method for emphasizing samples in
certain observation window locations and de�emphasizing others� However� the WM lter
output is restricted to be the median of the weight �repetition� expanded set� This lack of
freedom in choosing the rank of the output can limit performance in certain cases�
This limitation can be eliminated by allowing the rank of the output to be an adjustable
parameter� This leads to the class of WOS lters� which includes WM and all rank�order
lters as a subset� Moreover� the more powerful generalization developed in the remainder of
the paper are based on the WOS ltering operation�
The operation of a window size N WOS lter is dened by the N element weight vector
w and the rank parameter w�� For positive integer valued weights and rank parameter �the
integer constraint will be lifted shortly�� the output of the WOS lter is computed as
y�n� � w�th� Largest �x�n� w�� ����
Note that if w� ����� �
PNi�� wi� �or for non�integer weights w� �
��
PNi��wi�� then the WOS
lter reduces to a WM lter� The WOS lters also contain rank�order lters as a special
case� By restricting each of the weights to be unity� wi � � i � �� �� � � � � N � the WOS lter
output becomes y�n� � w�th� Largest �x�n� w� � x�w��� where again x���� x���� � � � � x�N� are
the order statistics� While rather simple� there are several applications where rank�order lters
can be e�ectively utilized� The demodulation of AM signals is one such example where the
output rank is selected so as to tract the envelope function of the AM signal� Figure �� depicts
the AM detection of a � kHz tone signal on a �� kHz carrier and sampled at �� kHz using
an eighth�ranked�order operation with a window size of �� Figure �� �a� shows the envelope
detection when no noise is present� whereas Fig� �� �b� shows the envelope detection in an
impulsive noise environment� Note that while impulsive noise is very disruptive with most
envelope detectors� the output of the rank�order lter is hardly perturbed by the noise�
As with WM lters� the restriction that the weights� and in this case w�� be integer valued
can be relaxed� For non�integer values� w� is referred to as the threshold and the WOS lter
output is determined by the same procedure used to nd the WM lter output for the non�
integer weights� The only di�erence being that w� is free to be chosen and not restricted to
w� �PN
i�� wi� Thus� WOS lters have N � � degrees of freedom� The freedom to set the
threshold� in addition to the weights� makes WOS lters a powerful class of lters with wide
ranging applications� Moreover� e�ective �adaptive� optimization procedures exist for WOS
lters� Furthermore� since the WM and WOS lters are simple generalization of the median�
we can expect some properties of the median lter to extrapolate to these more general lters�
This is in fact the case for the root signal properties and threshold decomposition� We revisit
threshold decomposition next because of its importance in the analysis and optimization of
WM and WOS lters�
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
70
80
90
100
110
120
130
140
150
160
0 10 20 30 40 50 60 70 80 90 100
Detected signal
Original signal
Original signal
70
80
90
100
110
120
130
140
150
160
0 10 20 30 40 50 60 70 80 90 100
Noisy signal
Detected signal
Signal corrupted with impulsive noise
Figure ��� Rank�order AM demodulation� The window size is �� and the output is the �thlargest in the window� Baseband signal is at � KHz with a carrier of ��KHz� The samplingfrequency is �� KHz� �a� noiseless reception� �b� noisy reception with impulsive noise Pr���������
��� Threshold Decomposition and Logic
As stated above� threshold decomposition extends to the class of WOS lters� To show this�
we begin by again restricting the input signal to have M levels� After proving that WOS lters
possess the threshold decomposition and stacking property� the conditions on the input signal
are relaxed to allow for the case of real valued inputs�
To begin� denote the input vector as x�n� � �x�� � � � � xN �T where xi � ZM � f� �� � � � �M �
�g� Recall that x�n� can be decomposed intoM�� binary vectors X��n��X��n�� � � � �XM���n��where the elements of the binary vectors are Xmi � Tm�xi� for m � �� �� � � � �M � � andi � �� �� � � � � N � Also� the decomposition is reversible� xi �
PM��m�� X
mi for i � �� �� � � � � N � The
decomposition can now be applied to the WOS lter operation�
y�n� � w��th largest �x� w� � � � � xN wN � ����
��
BARNER AND ARCE � WEIGHTED MEDIAN FILTERS
� w��th largest
��M��Xm��
Xm�
�
w�� � � � �
�M��Xm��
XmN
�
wN
�� ���
Since replicating each xi sample wi times is equivalent to replicating each Xmi binary sample
wi times and adding all these together� the above can be written as
y�n� � w��th largest
�M��Xm��
�Xm� w��� � � � �M��Xm��
�XmN wN ��� ����
The next step is to invoke the stacking property of threshold decomposition which states
that if Xpi � � for a given threshold level p� then Xqi � � for all levels q p� Similarly
if Xpi � � then Xqi � for all q p� Thus� nding the w��th largest sample in the set
x� w�� � � � � xN wN is equivalent to nding the maximum level m at which there are w� ormore ones in the set Xm� w�� � � � �XmN wN � Finding the maximum level which satises thiscondition� in turn� can be found by counting the levels which have w� or more ones in the
binary vectors� Hence� the output of the WOS lter can be written as
y�n� �M��Xm��
w��th largest� Xm� w�� � � � �XmN wN �� ���
This expression can be further simplied as ����
y�n� �M��Xm��
f�Xm�w� ����
where the function f�X�w� is a linearly separable threshold function
f�Xm�w� �
�� if
�PM��m�� wiX
mi
�� w�
otherwise�����
The output of a WOS lter can be nally expressed as
y�n� �M��Xm��
U��WT �Xm
�����
where U��� is a unit step function� and where �W � �w�� w�� w�� � � � � wN �T and �Xm � ����Xm� �� � � �XmN �T are the extended weight and extended observation vectors� respectively�
Hence� the WOS lter output is shown to not only satisfy the threshold decomposition
property but also to be characterized by a sum of linear threshold functions� Notice that in
the threshold domain ���� the WOS lter weights� including w�� are required to be positive
but can also be real�valued� The restriction that the input be integer�valued can also be
relaxed to allow for real�valued observations� Next� we generalize the threshold decomposition
architecture to handle real�valued signals ����
��
BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS
Take x�n� to be nonnegative and real�valued� The nonnegative constraint is taken for
convenience� and will be relaxed shortly� As in the integer�valued case� a real�valued observation
x�n� can be decomposed into a set of binary signals�
X��n� � U�x�n�� �� � �� ����
from which x�n� can be recovered�
x�n� �
Z ���
X��n�d �
Z ���
U�x�n�� �d � ����
The WOS ltering of a real�valued signal can now be implemented using threshold decompo�
sition as
y�n� �
Z ���
U��WT �X�
�d ����
where �X� � ����X�� � X�� � � � � � X�N �T � The integration is simplied by the fact that the obser�vation vector contains� at most� N di�erent valued samples� Consequently� there are at most
N � � di�erent binary vectors X�� The possible vectors are
X� �
���
��� �� � � � � ��T if � �� x�����X
x�i���� �X
x�i���� � � � � �X
x�i���N �
T if � �x�i���� x�i���� � � � � � �T if � �x�N�����
� ����
Using this fact it can be shown that
y�n� � x��� �NXi��
�x�i� � x�i����U��WT �Xx�i�
�� ���
This decomposition holds for both integer� and real�valued signals� as well as those that are
not strictly positive� Moreover� this decomposition is much more e�cient than that originally
derived for integer�valued signals since it requires only N �� threshold logic operations rather
thanM � This reduction in complexity simpliesWOS analysis and optimization� both of which
are performed in the threshold domain� By combining this threshold decomposition with unit
step function approximations� fast adaptive optimization algorithms can be developed� This
is the approach taken in Section �� which describes the optimization of WOS lters�
� Time�Rank Coupling Extensions� PWOS Filters
The generalizations of the median lter discussed in the previous section are based on the
weighting of samples� In the most general case covered� WOS lters� the observation samples
are weighted according to their temporal�order prior to rank ordering� This median lter
generalization method� as well as others such as stack lters ��� ��� have been proposed to
incorporate some form of temporal�order information into rank�order lters� Still� due to their
��
BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS
constrained nature� these methods do not fully utilize the information contained in both the
temporal and rank ordering of the observed data �����
An observation set of samples can� of course� be ordered in many ways� In most practical
situations samples are observed on a time ordered basis� e�g�� from a sensor which is regularly
sampled� This results in the temporal�ordered observation x� The samples comprising x can
be permuted such that they are ordered according to a di�erent criteria� such as rank� The
rank�ordered samples are written as xL� Thus� the mapping x �� xL is simply a permutation ofsamples� Moreover� this permutation mapping contains both the temporal and rank orderings
of an observation set of samples�
The temporal and rank natural orderings are important for the ltering process� Rank�
order information is vital for reducing the e�ect of outliers in non�Gaussian environments and
accurately tracking non�stationary signal discontinuities� Conversely� temporal�order informa�
tion is essential for preserving!rejecting signal frequency content and processing temporally
correlated signals� The class of Permutation �P� lters have been designed to take full ad�vantage of the permutation mapping x �� xL� and consequently� both the temporal � andrank�order of observation samples� By utilizing both orderings� permutation lters have shown
to be both robust and frequency selective ���� ��� ���� Moreover� the temporal� and rank�order
information can be simply augmented with additional statistics resulting in extended P lters����� which have additional capabilities� Selection P lters contain WOS lters� stack lters���� and some composition of discrete morphological operators� as a proper subset�
The use of the observation permutation as a basis for ltering has considerable advantages�
However� the factorial growth in the number of permutations� as a function of window size�
limits the practicality of using the full permutation information� Thus� a subset of the permu�
tation information must be used in practice� Optimizing on what� and how much� temporal
and rank information should be used is very di�cult� Therefore� we adopt a nested lattice
formulation of permutation lters� This lattice formulation gives a well structured method
for controlling the amount of temporal and rank information used� Each vertex of the lattice
denes a class of P lters which uses a xed amount of temporal� and rank�order information�This lattice is an extension of the L� ordering used in ��� ���� This extension results in a Lj�
time!rank ordering and lattice� where j indicates the amount of rank information incorporated�
respectively� To illustrate the concept� the following discussion starts with the simple L�� case�
Extensions are then made to the more general cases�
��� L�� PWOS �lters
The WOS lter operates on limited temporal� and rank�order information� Clearly� samples
are weighted according to their temporal�order� or equivalently� their location within the ob�
�Two orderings that arise naturally are temporal and rank� Other natural orderings include spatial� spectraland likelihood�
��
BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS
servation window� The expanded set is then rank ordered and the w��th sample taken as theoutput� The observation samples are ordered only after weighting� That is� the weight applied
to a sample is not dependent upon its rank�order� For instance� if the center sample is heavily
weighted to re�ect its importance� then the observation sample in that location is emphasized
regardless if it is a �good� sample or an outlier� In fact� all outliers are emphasized under this
scheme since each outlier occupies the center observation window location once� assuming the
window is sequential shifted over the sequence one sample at a time�
The samples in the observation window can be more appropriately weighted by considering
the temporal� and rank�order of each sample� To accomplish this weighting� dene the rank
indicator vector Ri � �Ri��Ri�� � � � �RiN �T � where
Rik ��� if xi �� x�k�
otherwise
����
and xi �� x�k� means that the kth order statistic occupies the ith temporal location in x� Letthe variable ri be the rank of xi in xL� hence� Riri � � by denition� Thus� Ri is a length Nbinary vector with a ��� in position ri� The other N � � positions in the vector are zeros� TheN rank indicators can be combined into a N� � N matrix P that indicates the rank of eachsample�
P �
�������R� � � � � �� R�
��� ����
���� � �
���� � � � � RN
������� � ���
where � is a N long vector of zeros�
Example ��� Consider the � sample �temporally�ordered� observation x � ��� �� �� ��� which
results in the rank�ordered vector xL � ��� �� �� ��� the four rank indicator vectors and their
respective rank parameters are
R� � �� � �� �T � r� � � R� � �� �� � �T � r� � �R � �� � � ��T � r � � R � ��� � � �T � r � �� ����
Combining them into the P matrix produces�
P �
�����
� � �� j � � � j � � � j � � �
� � � j � �� � j � � � j � � �
� � � j � � � j � � � � j � � �
� � � j � � � j � � � j �� � �
�����T
� ����
where the vertical separation bars have been added for convenience� Thus� the rst section of
the P matrix gives the rank of x�� the second gives the rank of x�� and continuing so until the
last section which is the rank of xN � �
�Note that this same rank information could be represented by a N� element vector� We use the less e�cientrepresentation only to allow simple matrix products� This will simplify the notation used shortly�
��
BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS
Having dened P� which gives the temporal� and rank�order of each sample� we can now
dene a corresponding weight vector� Since the goal is to weight each sample according to its
temporal� and rank�order� the weight vector must have N� entries� Consider the i�th temporalsample xi� This sample can take on N rank values� so N weights must be associating with this
sample� Dene the weight vector
Wi � �wi����� wi����� � � � � wi��N��T ����
with positive valued elements to be that associated with xi� Thus� each xi has N weights and
the single weight use at any given instant will depend on the rank of xi� Recalling that ri
is the rank of xi� the weight used at each instant is wi��ri�� Thus� each observation sample is
weighted according to both its temporal� and rank�order�
The N weight vectors can be stacked to form a single PWOS weight vector�
W � �WT� jWT� j � � � jWTN �T � ����
The appropriate weights fromW �only N weights are used at any given time� can be selected
using P� Once the weights are selected� the output of a PWOS lter is found in an analogous
manner to the WOS lter output� Formally� the PWOS output is dened as
y�n� � W�th� Largest �x
T WTP� ����� W
�th� Largest �x� WT� P�� x� WT� P�� � � � � xN WTNPN � ����
� W�th� Largest �x� w��r��� x� w��r��� � � � � xN wN�rN ��� ����
Thus� each input sample is weighted according to its temporal� and rank�order and the W�th�
largest sample is chosen as the output from the expanded set� Since the weight of each sample
depends on the temporal� and rank�order of one sample �itself�� this lter is said to use L��
temporal!rank information and reside at the L�� location on the L� lattice� which is dened
shortly�
The following examples illustrate the operation of PWOS ltering� The weights in the
example are integer�valued� However� like WOS lters� PWOS lter weights need only be
positive� We give only a integer�valued weight PWOS example as output for real�valued weights
is found similarly to the WOS case�
Example ��� Consider the window size � PWOS lter with W� � �� Let x � �x�� x�� x� �
��� �� ��� then� xr � �x���� x���� x��� � ��� �� ��� Let the PWOS weight vector be
W � �w����� w����� w���j � � � jw���� w���� w���T
� � �� �� �j�� �� �j�� �� � �T ����
�
BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS
From the observation vector� we can compute the matrix P� The rank indicator vectors for x
are�R� � �� � ��T � r� � �R� � ��� � �T � r� � �R � �� �� �T � r � ��
����
The weights obtained for the replication of the input samples are computed via WTP� which
evaluates to
WTP � ��� �� �j�� �� �j�� �� ��
��� � � �� j � � � j � �
� � � j �� � � j � �
� � � j � � � j � ��
���T
���
� ��� �� ��� ����
The output of the lter is
y�n� � ��thLargest�xT WTP�
� ��thLargest�� �� � �� � ��
� ��thLargest��� �� �� �� �� �� �� �� � ��
����
�
The advantage of considering both temporal� and rank�order when assigning weights is that
outliers can be detected and given a smaller weight� This is illustrated in Fig� �� which shows
optimal PWOS lter weights plotted as a mesh function of temporal� and rank�order� The input
to this lter was an image corrupted by heavy�tailed points� As the gure shows� the samples
given the most weight are centrally located in both time and rank� This makes intuitive sense
has the central temporal samples are expected to be more correlated with the desire center
sample then those which are temporally distant� Similarly� samples that lie in the extreme
ranks may be outliers and should be given smaller weight� Next� we extend temporal!rank
coupling to include more than one sample�
��� Lj� PWOS �lters
The weighting scheme derived in the previous section can be extended to include information
on the rank�order of multiple samples� Thus� each input sample can be weighted according to
not only its temporal� and rank�order� but also the rank�order of its neighbors� This scheme
allows the ranks of adjacent samples to be compared during the weighting process� Through
such comparisons� it can be better determined if a sample is truly an outlier� For instance� if
two adjacent samples both have high rank� then they may simply be samples which crossed an
edge� If only one sample has high rank� then with higher probability it is an outlier� To take
advantage of neighboring rank information� a general Lj� rank coupling technique is developed
next�
��
BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS
Temporal-order Rank-order
1
9
9
1
Figure ��� Optimal window size � PWOS lter weights plotted as a mesh function of temporal�and rank�order� The greatest weight is given to those samples that are centrally located inboth time and rank�
In the previous section rank indicators were used to characterize the rank of each �temporal�
sample xi� Suppose that we want to jointly characterize the ranks of two input samples� xi and
xi��� If the rank indicator vector for xi� Ri� is given� then we can form an additional indicatorvector for xi�� that does not contain the information provided in Ri� This vector� denotedby R�i � is the N � � length reduced indicator vector formed by removing the rthi element fromRi��� Thus� Ri gives the rank of xi and R�i gives the rank of xi��� given that we know the rankof xi already� We can extend this concept to more than two samples� Associated with the x
thi
input sample� the reduced rank indicator Rai is formed by removing the rthi � rthi��� � � � � rthi��a���elements from the vector Ri�a� where denotes Modulo N addition i a � �i� a� Mod N ��For example� if x � ��� �� �� �� and xr � ��� �� �� ��� then the rank indicator vectors and their
respective rank parameters are
R� � �� � �� �T � r� � � R� � �� �� � �T � r� � �R � �� � � ��T � r � � R � ��� � � �T � r � �� ����
The reduced rank indicator vectors R� and R� are� for example�
R� � �� r� � � �� �T � �� �� �TR� � �� r� � � �� � r� �T � �� ��T
����
where the rth sample was removed from R�� � R� to get R� and where the rth and rth
samples were deleted from R�� � R� to get R��
�The Modulo N operation de�ned here is on the group f�� �� � � � � Ng� such that N Mod N N� andN � � Mod N ��� The ranks can� of course� be coupled in a fashion other than cyclical Modulo N methodused here� e�g�� the next sample coupled to xi could be that of minimum temporal distance from xi resulting incoupling progressions xi� xi�� xi��� xi�� � � �� Such couplings result in similar �lter structures and results� Forsimplicity� we use the notationally simple cyclic Modulo N coupling here�
�
BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS
The rank indicator vectors Ri�R�i � � � � �Rj��i can be used to express the ranks of j consec�utive samples starting at xi� The rank permutation indicator associated with the x
thi input
sample is dened as
Pji � Ri �R�i � � � ��Rj��i ����
for � � j � N � where � denotes the matrix Kronecker product� Note that the vector Pji haslengthN�N��� � � � �N�j��� � P jN � The indicator vector Pji characterizes the relative rankingof the samples xi� xi��� � � � � xi��j���� Thus� P�i contains no ran