50

BARNER AND AR - University of Delawarebarner/courses/eleg675/papers...BARNER AND AR CE INTR ODUCTION Original Signal Noisy Signal Trimming Order Statistic 0 10 20 30 40 50 60 70 80

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Order�Statistic Filtering and Smoothing of

    Time�Series� Part II

    Kenneth E� Barner� and Gonzalo R� Arce�

    � Applied Science and Engineering Laboratories�Department of Electrical Engineering

    University of DelawareNewark� Delaware �����

    ��� ��������Email� barner�udel�edu

    � Department of Electrical EngineeringUniversity of DelawareNewark� Delaware �����

    Abstract

    This is the second paper of a two�part tutorial on the fundamentals of univariate time�series �ltering using order�statistics� where both temporal and rank orderings are consideredjointly� This second paper focuses on order�statistic selection �lters� where the �lter outputis restricted to be one of the input samples� In particular� we treat class of Weighted OrderStatistic �WOS� �lters� and the more generalized �lter class of Permutation WeightedOrder Statistic �PWOS� �lters� By combining temporal� and rank�order based weightingwith order�statistic selection� detail and edge preserving �lters that are robust to outliersand sample contamination can be constructed� Like their weighted sum counterparts�these selection �lters can be applied to the smoothing� �ltering� and forecasting of time�series� Furthermore� selection �lters can be optimized as a function of the underlyingsignal statistics� While the weighted sum �lter optimization is formulated under the MeanSquared Error� the selection �lters utilize the more robust Mean Absolute Error �MAE�criteria� This MAE optimization and selection based estimates results in a robust class of�lters that has advantages over the weighted sum counterparts in many applications� Theevolution of these �lters is covered and illustrative examples are given demonstrating theproperties and performance of this class of estimators�

    Invited paper to appear in the � Handbook of Statistics��� Order Statistics and Their Applications��

    C� R� Rao and N� Balakrishnan� Editors�

  • BARNER AND ARCE CONTENTS

    Contents

    � Introduction �

    � The Median Filter �

    ��� The Running Median Filter � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

    ��� Statistical Properties � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

    ��� Deterministic Properties � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

    ��� Median Filtering and Threshold Decomposition � � � � � � � � � � � � � � � � � � ��

    � Weighted Median Filters ��

    ��� Center Weighted Median Filters � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

    ��� Weighted Median Filters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

    ��� Weighted Order Statistic Filters � � � � � � � � � � � � � � � � � � � � � � � � � � ��

    ��� Threshold Decomposition and Logic � � � � � � � � � � � � � � � � � � � � � � � � ��

    � Time�Rank Coupling Extensions� PWOS Filters ��

    ��� L�� PWOS lters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

    ��� Lj� PWOS lters � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

    ��� PWOS Filter Lattices � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

    ��� Model Order �Complexity� Reduction � � � � � � � � � � � � � � � � � � � � � � � ��

    � Optimization Techniques ��

    ��� Problem Formulation � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

    ��� Algorithm I � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

    ����� The Steepest Descent and LMS Algorithms � � � � � � � � � � � � � � � � ��

    ��� Algorithm II � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

    � Applications to Image Restoration ��

    � Conclusion ��

  • BARNER AND ARCE � INTRODUCTION

    � Introduction

    Data time�series occur naturally in numerous elds of study including economics� engineering�

    medical� and many social elds� These time�series must often be processed� or ltered� to

    extract some information of interest� Traditionally� this ltering has been linear� Certainly�

    linear lters have a sound theoretical basis and have been extensively studied� Unfortunately�

    linear lters su�er from poor performance in many applications� Among the signals that linear

    lters perform poorly on are those with changing levels and corrupting noise that is either

    heavy tailed or signal dependent ���� This poor performance has lead to the investigation of

    nonlinear ltering methodologies�

    The design of nonlinear lters can follow many approaches since there is no single underlying

    theory of nonlinear lters� Thus� nonlinear lters range from simple ad hocmethods designed to

    tackle a single problem� to increasingly theoretically founded approaches that are more widely

    applicable� One nonlinear ltering approach that has received considerable attention� and for

    which much theoretical study has been conducted� is that based on rank�order� Indeed� much

    attention has been paid to rank�order lters since the running median lter was rst applied

    to the smoothing of time�series by Tukey in ���� ��� �� �� �� �� ���

    The rank�ordering of samples allows the design of lter structures that are �a� robust in

    environments where the assumed statistics deviate from Gaussian models and are possibly

    contaminated with outliers� and �b� track signal discontinuities without introducing transient

    or blurring artifacts as linear lters do� Filter classes that operate on rank�order information

    can be broadly broken down into two categories according to how the estimate is formed� The

    two lter categories are weighted sum and selection type� The weighted sum type lters form

    estimates by weighting the input samples� often as a function of temporal� and rank�order� and

    then summing the weighted samples to obtain an estimate� Such lters were discussed in Part

    I of this paper� The selection type lters take a di�erent approach� restricting the output to

    be one of the input samples� As in the weighted sum case� the input samples can be weighted

    to re�ect importance� but the lter output must be one of the observation samples�

    Selection rank�order lters have advantages over their weighted sum counterparts in many

    applications� This is particularly true for signals with numerous edges� such as images or

    biomedical signals where the measured process can change states abruptly� Weighted sum

    based lters tend to blur the edges of such signals� even if their weights are a function of

    temporal� and rank�order� In images� accurate tracking of edges is vital due to the nonlinear

    nature of the human visual system�

    Selection type lters have considerable advantages in edge tracking as compared to weighted

    sum lters� To illustrate this and motivate the selection approach� consider the raster�scan

    order ltering of an image corrupted by impulsive type noise� A common approach to limiting

    the e�ect of impulsive outliers is through trimming� In a weighted sum approach� this leads to

  • BARNER AND ARCE � INTRODUCTION

    Original Signal Noisy Signal Trimming Order Statistic

    0 10 20 30 40 50 60 70 80 90 10060

    80

    100

    120

    140

    160

    180

    200

    220

    240

    260

    Figure �� A single scan line from the image �aerial�� The original� corrupted and runningorder�statistics x��� and x�N����� are shown� The corruption is additive Laplacian noise withprobability of occurance ��� and � � ��� Also� N � �� and � � ��

    the ��trimmed mean� The output of this lter at instant n is

    y�n� ��

    N � ��� � ��N����Xi��

    x�i�� ���

    where x��� � x��� � � � � � x�N� are the N observed samples in rank order� Thus� the ��trimmedmean averages over all but the �� � smallest and largest samples� If � � �� the sample meanis realized while for � � N � the sample median is realized� A comparable trimming method

    that is selection type is the center weighted median ��� which can be expressed as

    y�n� � MED�x���� x�n�� x�N������� ���

    For this lter� the output is identical to the input as long as x��� � x�n� � x�N������ If x�n� isoutside this range� the output is trimmed to either x��� or x�N������ To compare the weightedsum and selection approaches� consider the single image scan line shown in Fig� �� This

    gure shows the original scan line� the scan corrupted by impulsive noise� and the running

    trimming statistics x��� and x�N������ As the gure shows� these statistics form a bandbetween which the samples are either averaged �weighted sum approach� or the input is passed

    to the output �selection approach�� Figure � shows the results of the two ltering operations�

    While both suppress outliers� the selection approach clearly performs better than the weighted

    sum approach which excessively smoothes all edges� The advantages of the selection approach

  • BARNER AND ARCE � INTRODUCTION

    Original Signal Selection Filter Output Weighted Sum Filter Output

    0 10 20 30 40 50 60 70 80 90 10060

    80

    100

    120

    140

    160

    180

    200

    220

    240

    Figure �� The output of the selection and weighted sum lters operating on the corrupted scanline in Fig� ��

    can more clearly be seen by examining the image in Fig� � whose upper left quarter is the

    original �aerial� image� the upper right quarter of the gure is the corresponding quarter

    of the image which has been corrupted by noise� the bottom left quarter is the output of

    a weighted�sum type order�statistic lter� and the bottom right quarter is the output of a

    selection�type order statistic lter� Both lters operate on a raster scan and have a width of

    seven�

    This example illustrates that the selection approach to ltering has clear advantages for

    certain signals� It is this general category of selection order�statistic lters that we cover in

    this paper� We begin by giving a brief review of the most well known and thoroughly studied

    selection order�statistic lter� the median lter� The median lter is also the starting point for

    many generalizations that have been developed� Therefore� a thorough understanding of the

    median lter is necessary to fully understand the principals behind the generalizations�

    The median lter� as will be shown� possesses many optimality properties� However� the

    lter o�ers little �exibility and is temporal blind� That is� all temporal information is lost

    in the ltering process� Permuting the time ordered observations� for instance� does not alter

    the lter output� This lack of temporal information causes performance to su�er� As a result�

    numerous generalizations of the median lter have been introduced that incorporate some form

    of temporal information ��� �� � �� �� ����

    Temporal information can be incorporated into order�statistic ltering through weighting

    of time ordered samples prior to rank ordering� This leads to the class of Weighted Median

    lters and Weighted Order Statistic �WOS� lters ����� Through weighting� certain temporal

  • BARNER AND ARCE � INTRODUCTION

    Figure �� The image �aerial� broken into four quadrants� upper left� original� upper right�noisy� lower left� weighted sum lter output� lower right� selection output�

    samples can be emphasized while others are deemphasized� This weighting scheme incorporates

    temporal information and results in considerable performance gain over temporal blind �strictly

    rank�order� lters� Still� the temporal�order weighting followed by rank ordering decouples the

    temporal from rank information during weighting� Due to this decoupling� these lters use

    only a fraction of the temporal and rank information contained in the two orderings�

    The full temporal and rank information is represented by the mapping that takes one or�

    dering to the other� p � x �� xL where x and xL are vectors containing the temporally andrank ordered observation samples� respectively� The full permutation mapping information

    can be utilized by coupling the temporal� and rank�order during weighting� This results in the

    powerful class of Permutation Weighted Order�Statistic �PWOS� lters ���� ���� While the per�

    formance achieved by using the full permutation information can be impressive� the explosive

    growth in the parameter set limits the number of samples for which the full information can

    be used� To combat this problem� a Lj� lattice approach to coupling temporal� and rank�order

    information is used� In the lattice terminology� � and L refer to temporal� and rank�order

    respectively� The exponents govern the amount of rank �j� coupling used� Thus� the amount

    of temporal and rank coupling is easily controlled� This o�ers �exibility in performance as

    well as control over the parameter set� In addition to the lattice approach� we detail alter�

    native methods for reducing the permutation information while retaining performance gains�

    To e�ectively utilize these classes of lters� the parameters must be set appropriately for the

  • BARNER AND ARCE � THE MEDIAN FILTER

    task at hand� To this end� we present two adaptive optimization techniques� Lastly� numerous

    examples are given illustrating the performance of the various lters�

    � The Median Filter

    The running median lter was the genesis for the broad array of rank order based ltering

    techniques that exist today� and that continue to be developed� The running median lter

    was rst suggested as a nonlinear smoother for time series data by Tukey in ���� ���� Since

    median lters are the foundation upon which current rank order based ltering techniques

    are based� a thorough understanding of the median lter and its properties is crucial to the

    development and understanding of current techniques� As such� a brief review of the median

    lter is given in this section� The review includes formal denitions and a survey of statistical

    and deterministic properties developed to characterize the median lter performance� Also

    included is a review of threshold decomposition� which was instrumental in developing many

    of the median lter properties� This review serves as a starting point for the median lter

    generalizations developed in the following sections�

    ��� The Running Median Filter

    To dene the running median lter� let fxg be a discrete time sequence� The running medianpasses a window over the sequence fxg that selects� at each instant n� an odd number of samplesto comprise the observation vector x�n�� The observation window is typically symmetric and

    centered at n� resulting in

    x�n� � �x�n�N��� � � � � x�n�� � � � � x�n�N���T � ���

    where N� may range in value over the nonnegative integers and N � �N� � � is the �odd

    valued� window size� While processing such non�causal observation vectors has traditionally

    been referred to as smoothing� we loosen the terminology somewhat and refer to the processing

    of both causal and non�causal observations as simply ltering� The median lter operating on

    the input sequence fxg produces the output sequence fyg� where at time index n

    y�n� � MED�x�n�� ���

    � Median value of �x�n�N��� � � � � x�n�� � � � � x�n�N���� ���

    That is� the samples in the observation window are sorted and the middle� or median� value is

    taken as the output�

    The input sequence fxg may be either nite or innite in extent� For the nite case� thesamples of fxg can be indexed as x���� x���� � � � � x�L�� where L is the length of the sequence�Due to the symmetric nature of the observation window� the window extends beyond a nite

    extent input sequence at both the beginning and end� These end e�ects are generally accounted

  • BARNER AND ARCE � THE MEDIAN FILTER

    0

    1

    2

    3

    4

    5

    0

    1

    2

    3

    4

    5

    Input

    Output

    Filter Motion

    Figure �� The operation of the window width � median lter� �� appended points�

    for by appending N� samples at the beginning and end of fxg� Although the appended samplescan be arbitrarily chosen� typically these are selected so that the points appended at the

    beginning of the sequence have the same value as the rst signal point� and the points appended

    at the end of the sequence all have the value of the last signal point�

    To illustrate the appending of input sequence and the median ltering operation� consider

    the input signal fxg of Figure �� In this example� fxg consists of � observations from a��level process� fx � x�n� � f� �� � � � � �g� n � �� �� � � � � �g� The gure shows the input se�quence and the resulting output sequence for a window size � median lter� Note that to

    account for edge e�ects� two samples have been appended to both the beginning and end

    of the sequence� The median lter output at the window location shown in the gure is

    y��� � MED�x���� x��� x���� x���� x����� � MED� �� �� �� �� � � � ��

    The median ltering operation is clearly nonlinear� As such� the median lter does not

    possess the superposition property� Thus� traditional frequency and impulse response analysis

    are not applicable� The impulse response of a median lter is� in fact� zero for all time�

    Consequently� alternative methods for analyzing and characterizing median lters must be

    employed� Broadly speaking� two types of analysis have been applied to the characterization

    of median lters� statistical and deterministic� Statistical properties examine the performance

    of the median lter� through such measures as optimality and output variance� for the case

    of white noise time sequences� Conversely� deterministic properties examine the lter output

    characteristics for specic types of commonly occurring deterministic time sequences� In the

    following� we review some of the statistical and deterministic properties of running median

    lters�

  • BARNER AND ARCE � THE MEDIAN FILTER

    ��� Statistical Properties

    The statistical properties of median lters can be examined through the derivation of output

    distributions and statistical conditions on the optimality of median estimates� These analysis

    generally assume that the input to the median lter is a constant signal with additive white

    noise� The assumption that the noise is additive and white is quite natural and made similarly

    in the analysis of linear lters� The assumption that the underlying signal is a constant is

    certainly convenient� but more importantly� often valid� This is especially true for the types of

    signals median lters are most frequently applied to� such as images� Signals such as images

    are characterized by regions of constant value separated by sharp transitions� or edges� Thus�

    the statistical analysis of a constant region is valid for large portions of these commonly used

    signals� By calculating the output distribution of the median lter over a constant region� the

    noise smoothing capabilities of the median can be measured through statistics such as the lter

    output variance�

    The median lter properties covered here are for time series signals consisting of white noise

    observation samples with known distribution� Since the observation sequence is probabilistic�

    the time index can be dropped and attention focused on a single observation vector� In this

    case� and others for which the time index n can be dropped without confusion� we do so and

    denote the observation vector as simply x � �x�� x�� � � � � xN �� Consider rst the case where the

    observation samples are white noise with a double exponential� or Laplacian� distribution� In

    this case� the common probability density function �pdf� is given by fx�t� ��p���

    e�p��jt��j�

    where � and �� are the mean and variance� respectively� For a vector of samples� the joint pdf

    is

    fx�t� �

    ��p���

    �Ne�

    p��

    PNi��

    jti��j� ���

    Given an observation vector x� the Maximum Likelihood �ML� estimate of the mean� or location

    parameter� is found by maximizing ��� with t � x� To simplify the notation� dene the distance

    operator D���� asD���� �

    NXi��

    jxi � �j� � ���

    Then the ML estimate of the location� for Laplacian distributed samples� is the value � that

    minimizes D���� with � � �� It is easy to show that

    MED�x�� x�� � � � � xN � � argmin�

    D����� ��

    Thus� the median of the samples x�� x�� � � � � xN is the value � that minimizes D����� and

    consequently� the ML estimate of location for samples with a Laplacian distribution� As a

    comparison�

    MEAN �x�� x�� � � � � xN � ��

    N

    NXi��

    xi � argmin�

    D���� ���

  • BARNER AND ARCE � THE MEDIAN FILTER

    is the ML estimate of location for samples with a Gaussian distribution� The median and sam�

    ple mean are� thus� optimal estimates of location for the Laplacian and Gaussian distributions�

    respectively� This shows that for heavy tailed distributions� such as the Laplacian� the median

    has advantages over the linear combination based sample mean� A further examination of

    D���� and D���� reinforces this point� The median is clearly the least absolute error estimateof the center of the distribution for x�� x�� � � � � xN � while the mean is the least squared error

    estimator� The reliance on the absolute error criteria means that the median is less in�uenced

    by outliers than the squared error based mean�

    Having established the types of signals for which median lters are optimal� the ltering

    operation can be further characterized through the determination of output distributions�

    Assume again that the input time series consists of white noise samples with pdf fx��� andcumulative distribution �cdf� Fx���� Under these conditions on the input samples� it well knownthat the median lter output cdf� Fmed���� and pdf� fmed���� are given by

    Fmed�t� �NX

    i�N���

    �Ni

    �Fx�t�

    i��� Fx�t��N�i ���

    and

    fmed�t� �N �

    N��N��fx�t�Fx�t�

    N���� Fx�t��N� ����

    respectively ����� From these expressions it can be shown that for t� and t� such that Fx�t�� �

    � � Fx�t��� then Fmed�t�� � � � Fmed�t�� also holds� By setting t� � t���� where by denitiont��� is the point satisfying Fx�t���� � ��� we see that the median is statistically unbiased in the

    sense that the median of the input is the median of the output� Moreover� the median behaves

    consistently for samples with asymmetric distributions�

    The calculation of statistics such as the output mean and variance from the expressions

    in ��� and ���� is often quite di�cult� Insight into the smoothing characteristics of the

    median lter can� however� be gained by examining the asymptotic behavior �N � �� ofthese statistics� where� under some general assumptions� results can be derived� For the case

    of white noise input samples� the asymptotic mean� �med� and variance� ��med� of the median

    lter output are

    �med � t���� ����

    and

    ��med ��

    �N�fx�t������� ����

    Thus� the median produces a consistent �limN�� � � and unbiased estimate of the inputdistribution median� irrespective of the input distribution� Note that the output variance is

    not proportional to the input variance� but rather ��fx�t����� For heavy tailed noise such as

    impulsive� ��fx�t���� is not related to input variance� i�e�� the variance is proportional to the

    impulse magnitude� not ��fx�t����� Thus� the output variance of the median in this case is not

  • BARNER AND ARCE � THE MEDIAN FILTER

    Mean and Median Filter Output Variance

    Input Sample Probability Filter TypeDensity Function Mean Median

    Uniformfx�t� �

    ��p����

    for�p��� � t �

    p���

    � otherwise��

    N���

    N��

    Gaussianfx�t� �

    �p����

    e� �

    ����t���� ��

    N���

    �N

    Laplacianfx�t� �

    �p���

    e�p�

    �jt��j ��

    N��

    �N

    Table �� Asymptotic output variences for the window size N mean and median lters for whiteinput samples with uniform� Gaussian� and Laplacian distributions�

    proportional to the input variance� This is not true for the sample mean and further explains

    the more robust behavior of the median�

    The variances for the sample mean and median lter output are given in Table � for the

    uniform� Gaussian� and Laplacian input distribution cases ����� The results hold for all N in

    the uniform case and are asymptotic for the Gaussian and Laplacian cases� Note that the

    median performs about � dB better than the sample mean for the Laplacian case and � dB

    worse in the Gaussian case�

    The median lter possesses numerous statistical properties in addition to those discussed

    above� Among those properties that illustrate the optimality of the median are ����

    �� The conditional median at each time instant n is the minimum Mean Absolute Error

    �MAE� estimator of the signal value at time n� where the conditioning is on the past

    history up to time n of the noise corrupted observations of the signal�

    �� The running median is� with high probability� a maximum a posteriori estimator of a

    constant signal in symmetric impulsive noise�

    These statistical properties are complemented by a set of deterministic properties� which are

    discussed next�

    ��� Deterministic Properties

    Statistical properties give considerable insight into the median lter performance� The median

    lter cannot� however� be su�ciently characterized through statistical properties alone� For

    instance� an important question not answered by the statistical properties is what type of

    signal� if any� is passed through the median lter unaltered� Linear lters� for example� can

  • BARNER AND ARCE � THE MEDIAN FILTER

    be analyzed in the frequency domain to determine� among other things� pass� and stop�band

    frequencies� If the frequency content of the input signal lies exclusively in the lter passband�

    then the signal passes through the lter unaltered�� Conversely� signal content in the stop band

    does not pass through� or is at least attenuated by� the lter� Somewhat analogous results do

    in fact exist for the median lter� For median lters� passband or invariant signals are referred

    to as root signals� The concept of root signals is important to the understanding of median

    lters and their e�ect on general signal structures� A review of the signicant results in root

    signal analysis is given in the following along with the main median lter properties resulting

    from this analysis�

    The denition of a root signal is quite simple� a signal is a median lter root if the signal

    is invariant under the median ltering operation� Thus� a signal fxg is a root of the windowsize N � �N� � � median lter if

    x�n� � MED�x�n�N��� � � � � x�n�� � � � � x�n�N��� ����

    for all n� As an example� consider the signal shown in Fig� �� This signal is ltered by three

    di�erent window size median lters �N� � �� �� and ��� Note that for the window size three

    case �N� � ��� the lter output is a root� That is� further ltering of this signal with the

    window size three median lter does not alter the signal� Notice� however� that if this same

    signal is ltered with a larger window size median� the signal will be modied� Thus� the signal

    in Fig� ��b� is in the passband� or a root� of a N� � � median lter but outside the passband�

    or not a root� of the N� � � and N� � � lters�

    The goal of root analysis is to relate the ltering of desired signals corrupted by noise to

    root and non�root signals� If it can be shown that certain types of desired signals are in the

    median lter root set� while noise is outside the root set� then the median ltering of a time

    series will preserve desired structures while altering the noise� Such a result does in fact hold

    and will be made clear through the following denitions and properties� First note that� as

    the example above illustrates� whether or not a signal is a median lter root depends on the

    window size of the lter in question� Clearly� all signals are roots of the window size one median

    �identity� lter� To investigate this dependence on window size� median lter root signals can

    be characterized in terms of local signal structures� where the local signal structures are related

    to the lter window size� Such a local structure based analysis serves two purposes� First�

    it denes signal structures that� when properly combined� form the median lter root set�

    Second� by relating the local structures to the lter window size� the e�ect of window size on

    roots is made clear� The local structure analysis of median lter roots relies on the following

    denitions�

    Denition ��� A constant neighborhood is a region of at least N��� consecutive identically

    �In general� the pass�band is de�ned in terms of the magnitude response� Thus� there may be some timeshifting of signals in the pass�band� depending on the �lter phase response�

  • BARNER AND ARCE � THE MEDIAN FILTER

    0

    1

    2

    3

    4

    0

    1

    2

    3

    4

    0

    1

    2

    3

    4

    Output signal for a window

    of size 3

    Input signal x(n)

    0

    1

    2

    3

    4

    Output signal for a window

    Output signal for a window

    of size 5

    of size 7

    Figure �� E�ects of window size on a median ltered signal� �� appended points�

    valued points� �

    Denition ��� An edge is a monotonic region between two constant neighborhoods of dif�

    ferent value� The connecting monotonic region cannot contain any constant neighborhoods�

    Denition ��� An impulse is a constant neighborhood followed by at least one� but no more

    than N� points which are then followed by another constant neighborhood having the same

    value as the rst constant neighborhood� The two boundary points of these at most N points

    do not have the same value as the two constant neighborhoods� �

    Denition ��� An oscillation is a sequence of points which is not part of a constant neigh�

    borhood� an edge or an impulse� �

    These denitions may now be used to develop a description of those signals which do and

    those which do not pass through a median lter without being perturbed� In particular� Gal�

    lagher and Wise ���� have developed a number of properties which characterize these signal

    sets for the case of nite length sequences� First� any impulse will be eliminated upon me�

    dian ltering� Secondly� a nite length signal is a median lter root if it consists of constant

    ��

  • BARNER AND ARCE � THE MEDIAN FILTER

    neighborhoods and edges only� Thus� if a desired signal is constructed solely of constant neigh�

    borhoods and edges� then it will not be altered by the median ltering operation� Conversely�

    if observation noise consists of impulses �as dened above�� it will be removed by the median

    ltering operation� These median lter root properties are made exact by the following�

    Denition ��� A sequence fxg is said to be locally monotonic of lengthm� denoted LOMO�m��if the subsequence x�n�� x�n� ��� � � � � x�n�m� �� is monotonic for all n �� �

    Property ��� Given a length L sequence to be median ltered with a length N � �N� � �

    window� a necessary and su�cient condition for the signal to be invariant �a root� under median

    ltering is that the extended �beginning and end appended� signal be LOMO�N� � ��� �

    Thus� the set of signals that forms the passband or root set �invariant to ltering� of a size

    N median lter consists solely of those signals that are formed of constant neighborhoods and

    edges� Note that by the denition of LOMO�m�� a change of trend implies that the sequence

    must stay constant for at least m� � points� It follows that for a median lter root signal tocontain both increasing and decreasing regions� these regions must be separated by a constant

    neighborhood of least N� � � identically valued samples� It is also clear from the denition

    of LOMO��� that a LOMO�m�� sequence is also LOMO�m�� for any two positive integersm� � m�� This implies that the roots for decreasing window size median lters are nested�i�e�� every root of a window size M lter is also a root of a window sized N median lter for

    all N M � This is formalized by�

    Property ��� Let S denote a set of nite length sequences and RN� be the root set of the

    window size N � �N� � � median lter operating on S� Then the root sets are nested such

    that � � �RN��� � RN� � RN��� � � � � � R� � R� � S� �

    In addition to the above description of the root signal set for a median lter� it can be

    shown that any signal of nite length is mapped to a root signal by repeated median ltering�

    In fact� it is simple to show that the rst and last points to change value on a median ltering

    operation remain invariant upon additional lter passes� where repeated lter passes consist

    of using the output of the prior lter pass for the input of an identical lter on the current

    pass� This fact� in turn� indicates that any L long nonroot signal �oscillations and impulses�

    will become a root structure after a maximum of �L � ���� successive lterings� This simplebound was improved in ���� where it was shown that at most

    �L� �

    ��N� � ��

    �����

    passes of the median lter are required to reach a root� This bound is conservative in practice

    since in most cases a root signal is obtained after ten or so lter passes�

    ��

  • BARNER AND ARCE � THE MEDIAN FILTER

    0

    1

    2

    3

    4

    0

    1

    2

    3

    4

    0

    1

    2

    3

    4

    Input signal x(n)

    0

    1

    2

    3

    4

    Root signal for a window

    Root signal for a window

    of size 5 ( 2 filter passes).

    of size 3 ( 1 filter pass).

    of size 7 ( 2 filter passes).

    Root signal for a window

    Figure �� Root signals obtained by median lters of size �� �� and �� �� appended points�

    The median lter root properties are illustrated through an example in Fig� �� This gure

    shows an original signal and the resultant root signals after multiple passes of window size ��

    �� and � median lters� Note that while it takes only a single pass of the window size � median

    lter to obtain a root� it takes two passes for the window sizes � and � median lters� Clearly�

    the locally monotonic structure requirements of the root signals are satised in Fig� �� For

    the window size � case� the input sequence becomes LOMO��� after a single pass of the lter�

    Thus� this sequence is in the root set of the window size � median lter� but not a root of the

    window size N � median lter since it is not LOMO�N� for N ��

    The deterministic and statistical properties form a powerful set of tools for describing the

    median ltering operation and performance� Together� they show that the median lter is

    an optimal estimator of location for Laplacian noise and that common signal structures� e�g��

    constant neighborhoods and edges in images� are in the lter pass�band �root set�� Moreover�

    impulses are removed by the ltering operation and repeated passes of the median lter always

    results in the signal converging to a root� where the root consists of a well dened set of

    structures related to the lter window size�

    ��

  • BARNER AND ARCE � THE MEDIAN FILTER

    ��� Median Filtering and Threshold Decomposition

    A fundamental property of median lters is threshold decomposition ���� This property was

    the key to deriving many of the median lter statistical and deterministic properties� Moreover�

    threshold decomposition is instrumental in the optimization of the median lter generalizations

    discussed in the following sections� A review of this important property is therefore in order�

    Threshold decomposition is simply a means of decomposing anM �level signal into an equiv�

    alent set of M � � binary sequences�� Let x�n� � �x�� x�� � � � � xN � be an N element observationvector where the signal is quantized toM levels such that x�n� � ZM � f� �� � � � �M��g� Thethreshold decomposition of x�n� results in the set of binary vectorsX��n��X��n�� � � � �XM���n��where Xi�n� � f� �gN is the observation vector thresholded at level i for i � �� �� � � � �M � ��As a function of the threshold operator Ti����

    Xi�n� � Ti�x�n�� ����

    � �Ti�x��� Ti�x��� � � � � Ti�xN �� ����

    �hXi��X

    i�� � � � � X

    iN

    i� ���

    where Ti��� is dened asXij � Ti�xj � �

    �� if xj � i

    otherwise

    � ����

    for i � �� �� � � � �M � � and j � �� �� � � � � N � In terms of the time indexed samples� Xi�n� �Ti�x�n��� Threshold decomposition can be reversed by simply adding the threshold decomposed

    signals�

    x�n� �M��Xi��

    Xi�n� and x�n� �M��Xi��

    Xi�n�� ���

    Furthermore� it was shown by Fitch et� al� that the median operation commutes with thresh�

    olding ����� Stated more formally� the median ltering of aM�level signal x�n� � f� �� � � � �M��g is equivalent to ltering the M � � threshold signals X��n��X��n�� � � � �XM���n�� and sum�ming the results�

    MED�x�n�� �M��Xi��

    MED�Xi�n�� ����

    for all n� Thus� threshold decomposition is a weak superposition property� A related property

    is the partial ordering property known as the stacking property�

    Denition ��� Let X and Y be N element binary vectors� Then X stacks on Y� which is

    denoted as Y � X� if and only if Yi � Xi for i � �� �� � � � � N � A function f��� possesses thestacking property if and only if

    Y � X f�Y� � f�X�� �����For now we restrict the discussion to quantized signals� This restriction is lifted in Section ���

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    1 01 2 0 3 3 1 2 2

    0 0 0

    001 1 1

    0 00 0 0

    1 1 0

    1 1

    1 1 0 0 0

    0 1

    1 1

    1 1

    1

    Binary Med. Filter

    Binary Med. Filter

    Binary Med. Filter

    Threshold at 1, 2, and 3 Add binary outputs

    Median Filter 1 1 3 3 2 21 0 2 2

    0 0

    1 1

    0 00 0 0

    1 1

    1 1

    1 1 0 0 0

    0

    1 1

    1 1

    11 10

    0 1 1

    Figure �� Median Filtering by threshold decomposition� The ��valued input signal is lteredby the running sorting method in the top part of the gure� In the bottom part of the gure�the signal is rst decomposed into a set of binary signals and each of these is ltered by abinary median lter� The output is produced by adding together the outputs of the binarymedian lters�

    The median lter was shown to possesses the stacking property ����� which can be stated as

    follows� In the threshold decomposition domain� the binary median lter output at threshold

    level i is always less than or equal to the binary median lter output at lower threshold levels�

    MED�Xi�n�� � MED�Xj�n�� ����

    for all i� j such that � � j � i �M � ��The stacking property is a partial ordering property� It states that the result of applying the

    median lter to each of the binary sequences obtained by thresholding the original signal will

    have a specic structure to them� Thus� in median ltering by threshold decomposition� the

    input sequence is rst decomposed inM�� binary sequences� and each of these is then lteredby a binary median lter� Furthermore� the set of output sequences possesses the stacking

    property� As a simple example� consider the median lter of window size three �N � ��

    being applied to a ��level input signal as shown in Fig� �� The outputs of the multi�level

    median lter and of the threshold decomposition median lter are identical because of the

    weak superposition property�

    � Weighted Median Filters

    Numerous generalizations to the median ltering operation have been introduced since Tukey

    rst suggested the median lter as a smoother in ���� ���� While many di�erent approaches

    have been taken in an attempt to improve the median lter performance� most have� in some

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    way� attempted to include temporal information into the ltering process� For most signals�

    and certainly those of practical interest� it is clear that certain observation samples have a

    higher degree of correlation with the desired estimate than do others� In the linear lter case�

    this correlation is re�ected in the weight given each sample� A similar weighting approach can

    be taken to generalize the median lter�

    The sample weighting approach to generalizing the median lter is developed in this section�

    We begin by discussing the Center Weighted Median �CWM� lter� in which only one sample�

    the sample centrally located in the observation window� is weighted� This is then generalized

    to the Weighted Median �WM� lter case in which all observation samples are weighted� In

    both the CWM and WM lter cases the output is the median value of the weighted set� A

    further generalization can be achieved by allowing the output to be an order statistic other than

    the median� This leads to the class of Weighted Order Statistic �WOS� lters� Following the

    development of these generalizations� we show that each possesses the threshold decomposition

    property� As noted earlier� threshold decomposition is an extremely powerful tool for both lter

    analysis and optimization� and is the nal topic covered in this section�

    ��� Center Weighted Median Filters

    The median lter is strictly a rank order operator� Thus� all temporal locations within the

    observation window are considered equivalent� That is� given a window of observation samples�

    any permutation of the samples within the observation window results in an identical median

    lter output� As stated above� for most signals certain samples within the observation window

    are more correlated with the desired estimate than are others� Due to the symmetric nature of

    the observation window� the sample most correlated with the desired estimate is� in general�

    the center observation sample�

    The center observation sample can be weighted to re�ect its importance� or correlation

    with the desired estimate� Since median lters select the output in a di�erent fashion than

    do linear lters� i�e�� ranking versus summing� the observation samples must also be weighted

    di�erently� In the median ltering case� weighting is accomplished through repetition� Thus�

    the output of the CWM lter is given by

    y�n� � MED�x�� � � � � xc��� xc wc� xc��� � � � � xN �� ����

    where xcwc denotes the replication operator xcwc �wc timesz ��

    xc� xc� � � � � xc and c � �N����� � N���

    is the index of the center sample� The center sample is thus repeated wc times� where wc is

    non�zero odd positive integer� Consequently� the output of the CWM lter is the median over

    an extended set containing multiple center samples� When wc � �� the operator is a median

    lter� and for wc � N � the CWM reduces to an identity operation� On the right side of ���� thetime index n has been dropped for notational simplicity and the observation samples indexed

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    0 50 100 150 200 250 300 350 400 450 500-1

    0

    1

    2

    3

    4

    5

    time n

    wei

    ght w

    Figure � E�ects of increasing the center weight of a CWM lter of size N � � operating onthe voiced speech �a�� The CWM lter output is shown for wc � �� �� �� �� �� Note that forwc � � the CWM reduces to median lter� and for wc � � it becomes the identity lter�

    according to their location in the observation window� In terms of the time series� the samples

    in the observation window are xi � x�n� �N� � �� � i� for i � �� �� � � � � N �The e�ect of varying the center sample weight is perhaps best seen by way of an example�

    Consider a segment of recorded speech� The voiced waveform �a� is shown at the top of Fig� �

    This speech signal is taken as the input of a CWM lter of size �� The outputs of the CWM�

    as the weight parameter wc from � to �� are also shown in Fig� � The vertical index denotes

    the value given to wc� The signal at the top is the original signal� or the output signal of the

    CMW when wc � N � or � in this example� The second signal from the top is the CWM ltered

    signal with wc � N � �� The weight wc is successively decreased until wc � �� in which casethe CWM lter reduces to the standard median�

    The smoothing characteristics of the CWM lter� as a function of the center sample weight�

    are illustrated in the previous example and gure� Clearly� as wc is increased less smoothing

    occurs� This response of the CWM lter is explained by the following property which relates

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    x(k ) x(N+1-k)x(1) x(N)

    Figure �� The center weighted median ltering operation� The center observation sample ismapped to the order statistic x�k� �x�N���k�� if the center sample is less �greater� than x�k��x�N���k��� and left unaltered otherwise�

    the weight wc and the CWM lter output to select order statistics �OS�� The N observation

    samples x�� x�� � � � � xN can be written as an OS vector�

    xL � �x���� x���� � � � � x�N��� ����

    where x��� � x��� � � � � � x�N�� The following relation ��� �� utilizes this notation�

    Property ��� Let fyg be the output of a CWM lter operating on the sequence fxg� Then

    y�n� � MED�x�� � � � � xc��� xc wc� xc��� � � � � xN �� ����� MED

    hx�k�� xc� x�N�k���

    i����

    where k � �N � �� wc��� for � � wc � N � and k � � for wc N � �

    From this property we can write the CWM lter output y�n� as

    y�n� �

    ���

    xc if x�k� � xc � x�N���k�x�k� if xc � x�k�x�N���k� if xc � x�N���k�

    � ���

    Since x�n� is the center sample in the observation window� i�e�� xc � x�n�� equation ���

    indicates that the output of the lter is identical to the input as long as the x�n� lies in the

    intervalhx�k�� x�N���k�

    i� If the center input sample is greater than x�N���k� the lter outputs

    x�N���k�� guarding against a high rank order �large� aberrant data point being taken as theoutput� Similarly� the lter�s output is x�k� if the sample x�n� is smaller than this order

    statistic� This CWM lter performance characteristic is illustrated in Figs� � and �� Figure �

    shows how the input sample is left unaltered if it is between the trimming statistics x�k� and

    x�N���k� and mapped to one of these statistics if it is outside this range� Figure � showsan example of the CWM lter operating on a Laplacian sequence� Along with the input and

    output� the trimming statistics are shown� It is easily seen how increasing k tightens the range

    in which the input is passed directly to the output�

    ��� Weighted Median Filters

    The weighting scheme used by CWM lters can be naturally extended to include all input

    samples� To this end� let w � �w�� w�� � � � � wN � be a N long weight vector with positive integer

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    Input Signal Trimming Order StatisticsFilter Output

    0 20 40 60 80 100 120 140 160 180 200−4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    Figure �� An example of the CWM lter operating on an i�i�d� Laplacian sequence with unitvariance� Shown are the lter input and output sequences as well as the trimming statisticsx�k� and x�N���k�� The lter window size is �� and k � ��

    elements that sum to an odd number� i�e��PN

    i�� wi is odd� Given this vector of weights� the

    WM lter operation is dened as ��

    y�n� � MED�x�n� w� ����� MED�x� w�� x� w�� � � � � xN wN �� ���

    Thus� WM lters incorporate temporal order information by weighting samples according to

    their temporal order prior to rank ltering� The ltering operation is illustrated through the

    following example�

    Example ��� Consider the window size � WM lter dened by the symmetric weight vector

    w � ��� �� �� �� ��� For the observation x�n� � ���� �� �� �� ��� the lter output is found as

    y�n� � MED� �� �� � �� � �� � �� � � �� MED� ��� �� �� �� �� �� �� �� � �� MED� �� �� �� �� �� �� �� �� �� �� �

    ����

    where the median value is underlined in equation ����� The large weighting on the center

    input sample results in this sample being taken as the output� As a comparison� the standard

    median output for the given input is y�n� � �� �

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    w1 w2 w3 w4 wN

    {x} Observation Window

    x1 x2 x3 x4 xN

    MED

    {y}

    . . .

    . . .

    Input

    Output

    Figure ��� The weighted median ltering operation�

    The WM ltering operation can be schematically described as in Fig� ��� This gure il�

    lustrates that as the lter window slides over an input sequence� the observation samples are

    duplicated �weighted� according to their temporal order within the window� This replication

    forms an expanded observation set which is then ordered according to rank� and the median

    sample selected as the output� In this fashion specic temporal order samples can be empha�

    sized� and others de�emphasized� The gure also illustrates that structurally� the WM lter is

    similar to the linear FIR lter� This relationship between linear and WM lters can be further

    explored through an alternative WM lter denition�

    The constraint that the WM lter weights be integer valued can be relaxed through a

    second� equivalent� lter denition� Thus� let w be an N element weight vector with positive

    �possibly� non�integer elements� The output of the WM lter dened by w and operating on

    the observation x�n� can be dened as

    y�n� � argmin�

    D�w��� ����

    where D�w��� is the weighted distance operator

    D�w��� �NXi��

    wijxi � �j�� ����

    Note thatD�w��� is piecewise linear and convex for wi � � i � �� �� � � � � N � Thus� argmin�D�w���is guaranteed to be one of the input samples x�� x�� � � � � xN � The WM lter output for non�

    integer weights can determined from ���� as follows�

    �� Calculate the threshold w� ���

    PNi�� wi�

    �� Sort the samples in the observation vector x�n��

    �� Sum the weights corresponding to the sorted samples beginning with the maximum

    sample and continuing down in order�

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    �� The output is the sample whose weight causes the sum to become � w��

    The following example illustrates this procedure�

    Example ��� Consider the window size � WM lter dened by the real valued weights

    w � ���� ��� ��� ��� ���� The output for this lter operating on the observation x�n� �

    ���� �� �� �� �� is found as follows� Summing the weights gives the threshold w� ���

    P�i�� wi �

    ���� The observation samples� sorted observation samples� their corresponding weight� and

    the partial sum of weights �from each ordered sample to the maximum� are�

    observation samples ��� �� �� �� �corresponding weights ��� ��� ��� ��� ��

    sorted observation samples �� �� �� �� ��corresponding weights ��� ��� ��� ��� ��partial weight sums ��� ��� ��� ��� ��

    ����

    Thus� the output is � since when starting from the right �maximum sample� and summing the

    weights� the threshold w� � ��� is not reached until the weight associated with � is added�

    The underlined sum value above indicates that this is the rst sum which meets or exceeds the

    threshold� �

    In the previous section the median and sample mean lters were related through the

    distance operator D����� There� it was shown that MED�x�n�� � argmin� D���� whileMEAN�x�n�� � argmin�D

    ����� Similar results hold relating the WM and linear FIR l�

    ters by means of the weighted distance measure D�w���� As stated in ����� the WM of x�n� isargmin�D

    �w��� for � � �� Interestingly� if the distance norm is changed to two� then

    argmin�

    D�w��� �

    PNi��wixiPNi�� wi

    � ����

    which is a normalized linear FIR lter �����

    Before ending the discussion on WM lters it is important to point out that the two lter

    denitions given �equations ��� and ����� are identical� It has been shown that any WM lter

    based on real valued weights has an equivalent integer valued weight representation ����� As

    an illustration� multiplying a weight vector by a positive constant results in an identical lter�

    Thus� the WM lter dened by the weight vector w � � �� �� �� �� � � is identical to that used in

    Example ���� Consequently� there are only a nite number of WM lters for a given window

    size� The number of WM lters� however� grows rapidly with window size� For instance� there

    are only � window size � WM lters� but ��� and ������ window size � and � WM lters�

    respectively ����

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    ��� Weighted Order Statistic Filters

    The weighting scheme used in WM lters is an e�ective method for emphasizing samples in

    certain observation window locations and de�emphasizing others� However� the WM lter

    output is restricted to be the median of the weight �repetition� expanded set� This lack of

    freedom in choosing the rank of the output can limit performance in certain cases�

    This limitation can be eliminated by allowing the rank of the output to be an adjustable

    parameter� This leads to the class of WOS lters� which includes WM and all rank�order

    lters as a subset� Moreover� the more powerful generalization developed in the remainder of

    the paper are based on the WOS ltering operation�

    The operation of a window size N WOS lter is dened by the N element weight vector

    w and the rank parameter w�� For positive integer valued weights and rank parameter �the

    integer constraint will be lifted shortly�� the output of the WOS lter is computed as

    y�n� � w�th� Largest �x�n� w�� ����

    Note that if w� ����� �

    PNi�� wi� �or for non�integer weights w� �

    ��

    PNi��wi�� then the WOS

    lter reduces to a WM lter� The WOS lters also contain rank�order lters as a special

    case� By restricting each of the weights to be unity� wi � � i � �� �� � � � � N � the WOS lter

    output becomes y�n� � w�th� Largest �x�n� w� � x�w��� where again x���� x���� � � � � x�N� are

    the order statistics� While rather simple� there are several applications where rank�order lters

    can be e�ectively utilized� The demodulation of AM signals is one such example where the

    output rank is selected so as to tract the envelope function of the AM signal� Figure �� depicts

    the AM detection of a � kHz tone signal on a �� kHz carrier and sampled at �� kHz using

    an eighth�ranked�order operation with a window size of �� Figure �� �a� shows the envelope

    detection when no noise is present� whereas Fig� �� �b� shows the envelope detection in an

    impulsive noise environment� Note that while impulsive noise is very disruptive with most

    envelope detectors� the output of the rank�order lter is hardly perturbed by the noise�

    As with WM lters� the restriction that the weights� and in this case w�� be integer valued

    can be relaxed� For non�integer values� w� is referred to as the threshold and the WOS lter

    output is determined by the same procedure used to nd the WM lter output for the non�

    integer weights� The only di�erence being that w� is free to be chosen and not restricted to

    w� �PN

    i�� wi� Thus� WOS lters have N � � degrees of freedom� The freedom to set the

    threshold� in addition to the weights� makes WOS lters a powerful class of lters with wide

    ranging applications� Moreover� e�ective �adaptive� optimization procedures exist for WOS

    lters� Furthermore� since the WM and WOS lters are simple generalization of the median�

    we can expect some properties of the median lter to extrapolate to these more general lters�

    This is in fact the case for the root signal properties and threshold decomposition� We revisit

    threshold decomposition next because of its importance in the analysis and optimization of

    WM and WOS lters�

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    70

    80

    90

    100

    110

    120

    130

    140

    150

    160

    0 10 20 30 40 50 60 70 80 90 100

    Detected signal

    Original signal

    Original signal

    70

    80

    90

    100

    110

    120

    130

    140

    150

    160

    0 10 20 30 40 50 60 70 80 90 100

    Noisy signal

    Detected signal

    Signal corrupted with impulsive noise

    Figure ��� Rank�order AM demodulation� The window size is �� and the output is the �thlargest in the window� Baseband signal is at � KHz with a carrier of ��KHz� The samplingfrequency is �� KHz� �a� noiseless reception� �b� noisy reception with impulsive noise Pr���������

    ��� Threshold Decomposition and Logic

    As stated above� threshold decomposition extends to the class of WOS lters� To show this�

    we begin by again restricting the input signal to have M levels� After proving that WOS lters

    possess the threshold decomposition and stacking property� the conditions on the input signal

    are relaxed to allow for the case of real valued inputs�

    To begin� denote the input vector as x�n� � �x�� � � � � xN �T where xi � ZM � f� �� � � � �M �

    �g� Recall that x�n� can be decomposed intoM�� binary vectors X��n��X��n�� � � � �XM���n��where the elements of the binary vectors are Xmi � Tm�xi� for m � �� �� � � � �M � � andi � �� �� � � � � N � Also� the decomposition is reversible� xi �

    PM��m�� X

    mi for i � �� �� � � � � N � The

    decomposition can now be applied to the WOS lter operation�

    y�n� � w��th largest �x� w� � � � � xN wN � ����

    ��

  • BARNER AND ARCE � WEIGHTED MEDIAN FILTERS

    � w��th largest

    ��M��Xm��

    Xm�

    w�� � � � �

    �M��Xm��

    XmN

    wN

    �� ���

    Since replicating each xi sample wi times is equivalent to replicating each Xmi binary sample

    wi times and adding all these together� the above can be written as

    y�n� � w��th largest

    �M��Xm��

    �Xm� w��� � � � �M��Xm��

    �XmN wN ��� ����

    The next step is to invoke the stacking property of threshold decomposition which states

    that if Xpi � � for a given threshold level p� then Xqi � � for all levels q p� Similarly

    if Xpi � � then Xqi � for all q p� Thus� nding the w��th largest sample in the set

    x� w�� � � � � xN wN is equivalent to nding the maximum level m at which there are w� ormore ones in the set Xm� w�� � � � �XmN wN � Finding the maximum level which satises thiscondition� in turn� can be found by counting the levels which have w� or more ones in the

    binary vectors� Hence� the output of the WOS lter can be written as

    y�n� �M��Xm��

    w��th largest� Xm� w�� � � � �XmN wN �� ���

    This expression can be further simplied as ����

    y�n� �M��Xm��

    f�Xm�w� ����

    where the function f�X�w� is a linearly separable threshold function

    f�Xm�w� �

    �� if

    �PM��m�� wiX

    mi

    �� w�

    otherwise�����

    The output of a WOS lter can be nally expressed as

    y�n� �M��Xm��

    U��WT �Xm

    �����

    where U��� is a unit step function� and where �W � �w�� w�� w�� � � � � wN �T and �Xm � ����Xm� �� � � �XmN �T are the extended weight and extended observation vectors� respectively�

    Hence� the WOS lter output is shown to not only satisfy the threshold decomposition

    property but also to be characterized by a sum of linear threshold functions� Notice that in

    the threshold domain ���� the WOS lter weights� including w�� are required to be positive

    but can also be real�valued� The restriction that the input be integer�valued can also be

    relaxed to allow for real�valued observations� Next� we generalize the threshold decomposition

    architecture to handle real�valued signals ����

    ��

  • BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

    Take x�n� to be nonnegative and real�valued� The nonnegative constraint is taken for

    convenience� and will be relaxed shortly� As in the integer�valued case� a real�valued observation

    x�n� can be decomposed into a set of binary signals�

    X��n� � U�x�n�� �� � �� ����

    from which x�n� can be recovered�

    x�n� �

    Z ���

    X��n�d �

    Z ���

    U�x�n�� �d � ����

    The WOS ltering of a real�valued signal can now be implemented using threshold decompo�

    sition as

    y�n� �

    Z ���

    U��WT �X�

    �d ����

    where �X� � ����X�� � X�� � � � � � X�N �T � The integration is simplied by the fact that the obser�vation vector contains� at most� N di�erent valued samples� Consequently� there are at most

    N � � di�erent binary vectors X�� The possible vectors are

    X� �

    ���

    ��� �� � � � � ��T if � �� x�����X

    x�i���� �X

    x�i���� � � � � �X

    x�i���N �

    T if � �x�i���� x�i���� � � � � � �T if � �x�N�����

    � ����

    Using this fact it can be shown that

    y�n� � x��� �NXi��

    �x�i� � x�i����U��WT �Xx�i�

    �� ���

    This decomposition holds for both integer� and real�valued signals� as well as those that are

    not strictly positive� Moreover� this decomposition is much more e�cient than that originally

    derived for integer�valued signals since it requires only N �� threshold logic operations rather

    thanM � This reduction in complexity simpliesWOS analysis and optimization� both of which

    are performed in the threshold domain� By combining this threshold decomposition with unit

    step function approximations� fast adaptive optimization algorithms can be developed� This

    is the approach taken in Section �� which describes the optimization of WOS lters�

    � Time�Rank Coupling Extensions� PWOS Filters

    The generalizations of the median lter discussed in the previous section are based on the

    weighting of samples� In the most general case covered� WOS lters� the observation samples

    are weighted according to their temporal�order prior to rank ordering� This median lter

    generalization method� as well as others such as stack lters ��� ��� have been proposed to

    incorporate some form of temporal�order information into rank�order lters� Still� due to their

    ��

  • BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

    constrained nature� these methods do not fully utilize the information contained in both the

    temporal and rank ordering of the observed data �����

    An observation set of samples can� of course� be ordered in many ways� In most practical

    situations samples are observed on a time ordered basis� e�g�� from a sensor which is regularly

    sampled� This results in the temporal�ordered observation x� The samples comprising x can

    be permuted such that they are ordered according to a di�erent criteria� such as rank� The

    rank�ordered samples are written as xL� Thus� the mapping x �� xL is simply a permutation ofsamples� Moreover� this permutation mapping contains both the temporal and rank orderings

    of an observation set of samples�

    The temporal and rank natural orderings are important for the ltering process� Rank�

    order information is vital for reducing the e�ect of outliers in non�Gaussian environments and

    accurately tracking non�stationary signal discontinuities� Conversely� temporal�order informa�

    tion is essential for preserving!rejecting signal frequency content and processing temporally

    correlated signals� The class of Permutation �P� lters have been designed to take full ad�vantage of the permutation mapping x �� xL� and consequently� both the temporal � andrank�order of observation samples� By utilizing both orderings� permutation lters have shown

    to be both robust and frequency selective ���� ��� ���� Moreover� the temporal� and rank�order

    information can be simply augmented with additional statistics resulting in extended P lters����� which have additional capabilities� Selection P lters contain WOS lters� stack lters���� and some composition of discrete morphological operators� as a proper subset�

    The use of the observation permutation as a basis for ltering has considerable advantages�

    However� the factorial growth in the number of permutations� as a function of window size�

    limits the practicality of using the full permutation information� Thus� a subset of the permu�

    tation information must be used in practice� Optimizing on what� and how much� temporal

    and rank information should be used is very di�cult� Therefore� we adopt a nested lattice

    formulation of permutation lters� This lattice formulation gives a well structured method

    for controlling the amount of temporal and rank information used� Each vertex of the lattice

    denes a class of P lters which uses a xed amount of temporal� and rank�order information�This lattice is an extension of the L� ordering used in ��� ���� This extension results in a Lj�

    time!rank ordering and lattice� where j indicates the amount of rank information incorporated�

    respectively� To illustrate the concept� the following discussion starts with the simple L�� case�

    Extensions are then made to the more general cases�

    ��� L�� PWOS �lters

    The WOS lter operates on limited temporal� and rank�order information� Clearly� samples

    are weighted according to their temporal�order� or equivalently� their location within the ob�

    �Two orderings that arise naturally are temporal and rank� Other natural orderings include spatial� spectraland likelihood�

    ��

  • BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

    servation window� The expanded set is then rank ordered and the w��th sample taken as theoutput� The observation samples are ordered only after weighting� That is� the weight applied

    to a sample is not dependent upon its rank�order� For instance� if the center sample is heavily

    weighted to re�ect its importance� then the observation sample in that location is emphasized

    regardless if it is a �good� sample or an outlier� In fact� all outliers are emphasized under this

    scheme since each outlier occupies the center observation window location once� assuming the

    window is sequential shifted over the sequence one sample at a time�

    The samples in the observation window can be more appropriately weighted by considering

    the temporal� and rank�order of each sample� To accomplish this weighting� dene the rank

    indicator vector Ri � �Ri��Ri�� � � � �RiN �T � where

    Rik ��� if xi �� x�k�

    otherwise

    ����

    and xi �� x�k� means that the kth order statistic occupies the ith temporal location in x� Letthe variable ri be the rank of xi in xL� hence� Riri � � by denition� Thus� Ri is a length Nbinary vector with a ��� in position ri� The other N � � positions in the vector are zeros� TheN rank indicators can be combined into a N� � N matrix P that indicates the rank of eachsample�

    P �

    �������R� � � � � �� R�

    ��� ����

    ���� � �

    ���� � � � � RN

    ������� � ���

    where � is a N long vector of zeros�

    Example ��� Consider the � sample �temporally�ordered� observation x � ��� �� �� ��� which

    results in the rank�ordered vector xL � ��� �� �� ��� the four rank indicator vectors and their

    respective rank parameters are

    R� � �� � �� �T � r� � � R� � �� �� � �T � r� � �R � �� � � ��T � r � � R � ��� � � �T � r � �� ����

    Combining them into the P matrix produces�

    P �

    �����

    � � �� j � � � j � � � j � � �

    � � � j � �� � j � � � j � � �

    � � � j � � � j � � � � j � � �

    � � � j � � � j � � � j �� � �

    �����T

    � ����

    where the vertical separation bars have been added for convenience� Thus� the rst section of

    the P matrix gives the rank of x�� the second gives the rank of x�� and continuing so until the

    last section which is the rank of xN � �

    �Note that this same rank information could be represented by a N� element vector� We use the less e�cientrepresentation only to allow simple matrix products� This will simplify the notation used shortly�

    ��

  • BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

    Having dened P� which gives the temporal� and rank�order of each sample� we can now

    dene a corresponding weight vector� Since the goal is to weight each sample according to its

    temporal� and rank�order� the weight vector must have N� entries� Consider the i�th temporalsample xi� This sample can take on N rank values� so N weights must be associating with this

    sample� Dene the weight vector

    Wi � �wi����� wi����� � � � � wi��N��T ����

    with positive valued elements to be that associated with xi� Thus� each xi has N weights and

    the single weight use at any given instant will depend on the rank of xi� Recalling that ri

    is the rank of xi� the weight used at each instant is wi��ri�� Thus� each observation sample is

    weighted according to both its temporal� and rank�order�

    The N weight vectors can be stacked to form a single PWOS weight vector�

    W � �WT� jWT� j � � � jWTN �T � ����

    The appropriate weights fromW �only N weights are used at any given time� can be selected

    using P� Once the weights are selected� the output of a PWOS lter is found in an analogous

    manner to the WOS lter output� Formally� the PWOS output is dened as

    y�n� � W�th� Largest �x

    T WTP� ����� W

    �th� Largest �x� WT� P�� x� WT� P�� � � � � xN WTNPN � ����

    � W�th� Largest �x� w��r��� x� w��r��� � � � � xN wN�rN ��� ����

    Thus� each input sample is weighted according to its temporal� and rank�order and the W�th�

    largest sample is chosen as the output from the expanded set� Since the weight of each sample

    depends on the temporal� and rank�order of one sample �itself�� this lter is said to use L��

    temporal!rank information and reside at the L�� location on the L� lattice� which is dened

    shortly�

    The following examples illustrate the operation of PWOS ltering� The weights in the

    example are integer�valued� However� like WOS lters� PWOS lter weights need only be

    positive� We give only a integer�valued weight PWOS example as output for real�valued weights

    is found similarly to the WOS case�

    Example ��� Consider the window size � PWOS lter with W� � �� Let x � �x�� x�� x� �

    ��� �� ��� then� xr � �x���� x���� x��� � ��� �� ��� Let the PWOS weight vector be

    W � �w����� w����� w���j � � � jw���� w���� w���T

    � � �� �� �j�� �� �j�� �� � �T ����

  • BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

    From the observation vector� we can compute the matrix P� The rank indicator vectors for x

    are�R� � �� � ��T � r� � �R� � ��� � �T � r� � �R � �� �� �T � r � ��

    ����

    The weights obtained for the replication of the input samples are computed via WTP� which

    evaluates to

    WTP � ��� �� �j�� �� �j�� �� ��

    ��� � � �� j � � � j � �

    � � � j �� � � j � �

    � � � j � � � j � ��

    ���T

    ���

    � ��� �� ��� ����

    The output of the lter is

    y�n� � ��thLargest�xT WTP�

    � ��thLargest�� �� � �� � ��

    � ��thLargest��� �� �� �� �� �� �� �� � ��

    ����

    The advantage of considering both temporal� and rank�order when assigning weights is that

    outliers can be detected and given a smaller weight� This is illustrated in Fig� �� which shows

    optimal PWOS lter weights plotted as a mesh function of temporal� and rank�order� The input

    to this lter was an image corrupted by heavy�tailed points� As the gure shows� the samples

    given the most weight are centrally located in both time and rank� This makes intuitive sense

    has the central temporal samples are expected to be more correlated with the desire center

    sample then those which are temporally distant� Similarly� samples that lie in the extreme

    ranks may be outliers and should be given smaller weight� Next� we extend temporal!rank

    coupling to include more than one sample�

    ��� Lj� PWOS �lters

    The weighting scheme derived in the previous section can be extended to include information

    on the rank�order of multiple samples� Thus� each input sample can be weighted according to

    not only its temporal� and rank�order� but also the rank�order of its neighbors� This scheme

    allows the ranks of adjacent samples to be compared during the weighting process� Through

    such comparisons� it can be better determined if a sample is truly an outlier� For instance� if

    two adjacent samples both have high rank� then they may simply be samples which crossed an

    edge� If only one sample has high rank� then with higher probability it is an outlier� To take

    advantage of neighboring rank information� a general Lj� rank coupling technique is developed

    next�

    ��

  • BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

    Temporal-order Rank-order

    1

    9

    9

    1

    Figure ��� Optimal window size � PWOS lter weights plotted as a mesh function of temporal�and rank�order� The greatest weight is given to those samples that are centrally located inboth time and rank�

    In the previous section rank indicators were used to characterize the rank of each �temporal�

    sample xi� Suppose that we want to jointly characterize the ranks of two input samples� xi and

    xi��� If the rank indicator vector for xi� Ri� is given� then we can form an additional indicatorvector for xi�� that does not contain the information provided in Ri� This vector� denotedby R�i � is the N � � length reduced indicator vector formed by removing the rthi element fromRi��� Thus� Ri gives the rank of xi and R�i gives the rank of xi��� given that we know the rankof xi already� We can extend this concept to more than two samples� Associated with the x

    thi

    input sample� the reduced rank indicator Rai is formed by removing the rthi � rthi��� � � � � rthi��a���elements from the vector Ri�a� where denotes Modulo N addition i a � �i� a� Mod N ��For example� if x � ��� �� �� �� and xr � ��� �� �� ��� then the rank indicator vectors and their

    respective rank parameters are

    R� � �� � �� �T � r� � � R� � �� �� � �T � r� � �R � �� � � ��T � r � � R � ��� � � �T � r � �� ����

    The reduced rank indicator vectors R� and R� are� for example�

    R� � �� r� � � �� �T � �� �� �TR� � �� r� � � �� � r� �T � �� ��T

    ����

    where the rth sample was removed from R�� � R� to get R� and where the rth and rth

    samples were deleted from R�� � R� to get R��

    �The Modulo N operation de�ned here is on the group f�� �� � � � � Ng� such that N Mod N N� andN � � Mod N ��� The ranks can� of course� be coupled in a fashion other than cyclical Modulo N methodused here� e�g�� the next sample coupled to xi could be that of minimum temporal distance from xi resulting incoupling progressions xi� xi�� xi��� xi�� � � �� Such couplings result in similar �lter structures and results� Forsimplicity� we use the notationally simple cyclic Modulo N coupling here�

  • BARNER AND ARCE TIMERANK COUPLING EXTENSIONS� PWOS FILTERS

    The rank indicator vectors Ri�R�i � � � � �Rj��i can be used to express the ranks of j consec�utive samples starting at xi� The rank permutation indicator associated with the x

    thi input

    sample is dened as

    Pji � Ri �R�i � � � ��Rj��i ����

    for � � j � N � where � denotes the matrix Kronecker product� Note that the vector Pji haslengthN�N��� � � � �N�j��� � P jN � The indicator vector Pji characterizes the relative rankingof the samples xi� xi��� � � � � xi��j���� Thus� P�i contains no ran