Upload
phungthuy
View
218
Download
2
Embed Size (px)
Citation preview
Short Papers___________________________________________________________________________________________________
Genetic Algorithm Wavelet Designfor Signal Classification
Eric Jones, Member, IEEE,Paul Runkle, Member, IEEE,
Nilanjan Dasgupta, Student Member,IEEE, Luise Couchman, and
Lawrence Carin, Fellow, IEEE
AbstractÐBiorthogonal wavelets are applied to parse multiaspect transient
scattering data in the context of signal classification. A language-based genetic
algorithm is used to design wavelet filters that enhance classification performance.
The biorthogonal wavelets are implemented via the lifting procedure and the
optimization is carried out using a classification-based cost function. Example
results are presented for target classification using measured scattering data.
Index TermsÐGenetic algorithms, wavelets, classification.
æ
1 INTRODUCTION
WAVELET design is a problem that has attracted significantattention over the last decade. Tewfik et al. [1] designed orthogonalwavelets based on computing bounds on cost functions, the latterbased on either minimizing the error between the original andapproximate signal representation, or on maximizing the norm ofthe projection of the signal onto the wavelet space. Oslick et al. [2]have developed a paradigm for the design of general biorthogonalwavelets with this construct amenable to a cost function. These andother wavelet design paradigms are based on frequency-domainfilter characteristics. As an alternative, Sweldens [3] has developeda general scheme for designing biorthogonal wavelets, implemen-ted directly in the time domain. This formalism, termed ªlifting,ºyields a simple technique for insertion into a general cost function.For example, Claypoole et al. [4] have developed severaltechniques for adaptive wavelet design based on lifting. In [4],an efficient design solution was based on a linearly constrainedleast-squares minimization of the signal-representation error oneach wavelet level.
While the wavelet design techniques in [1], [4] have clear utilityin the context of compression, there are other applications forwhich alternative cost functions are desirable. In this paper, we areinterested in signal classification based on wavelet-based featureparsing. While error minimization of the wavelet representationmay be a salutary goal, it does not address the ultimate objective ofimproved classification performance. The optimal choice ofwavelets for signal classification depends on the details of thesignal classes and on the classifier. In many cases, the classifier istoo complicated to allow a direct solution for the optimal waveletrepresentation, suggesting the use of a genetic algorithm (GA) [5]for cost-function optimization. This approach is pursued in this
paper and the classification performance of the GA-designedwavelets is compared to that of classical (Cohen et al. [6] wavelets,as well as to wavelets designed by minimizing the error in thewavelet representation [4]. It is important to note that the costfunction employed in [4] allowed a direct solution, without theneed for a GA. However, that cost function does not permit one toaddress the ultimate goal of improved classification. The costfunction introduced here is based explicitly on classifier perfor-mance, this not permitting a direct solution for the optimalwavelets. Therefore, we have employed this classification-basedcost function in a GA.
2 LIFTING-BASED WAVELET DESIGN
The discrete wavelet transform (DWT) is computed efficiently via arecursive multirate filterbank [7]. At each scale j, the filterbankdecomposes the signal into low-pass and high-pass componentsthrough convolution (and subsequent decimation) with FIR filtersh and g, respectively. The DWT representation is composed ofscaling coefficients, c�k�, representing coarse or low-pass signalinformation at scale j � 0, and wavelet coefficients, dj�k�,representing signal detail at scales j � 1; . . . ; J . Formally,
cj�k� �Xm
h�2kÿm�cj�1�m� dj�k� �Xm
g�2kÿm�cj�1�m�; �1�
where the original discrete-time signal is given by cJ �k�, of length2J samples. At the jth level, both cj�k� and dj�k� are composed of 2j
samples, forming a tree-like relationship between the coefficientsat successive scales. Signal reconstruction may be effected throughthe application of the inverse DWT [7].
Although standard wavelet families are well-suited for analysisof general signals, it is also possible to design wavelet transformsthat are adapted to the signals of interest. Rather than use anorthogonal wavelet basis, we choose biorthogonal wavelets whichallow more flexibility in the system design. Among the advantagesof a biorthogonal system are that the filters h and g (in (1)) neednot be of the same length, so that the parameters may berepartitioned to meet specific design constraints. In this paper,we exploit an alternative architecture to the multirate analysisfilterbank, known as the lifting scheme [8]. It has been shown thatany set of wavelet and scaling filters, including those associatedwith a biorthogonal bases, may be decomposed in terms of thelifting structure, which is a lattice-type realization of the multiratefilterbank [8].
At each scale, the lifting scheme is implemented in thefollowing manner: The signal under analysis, cj�1�k�, is split intoits odd and even components cj�1�2k� and cj�1�2k� 1�, analogousto the decimation operation in the standard multirate filterbank. AFIR filter, p, of order Np is used to predict the odd components as alinear combination of the even. The wavelet coefficients may beidentified as the ªdetailº in the higher resolution data that is notpredicted by the even component of the analysis signal:
dj�k� � cj�1�2k� 1� ÿXm
p�m�cj�1�2m� kÿ no�; �2�
where no � �Np ÿ 2�=2 is a temporal shift to properly align thewavelet coefficients according to the prediction filter order. From(1) and (2), g is expressed in terms of the coefficients of p
g�2k� � ÿp�k� g�2k� 1� � ��kÿ no� �3�with k � 0; . . . ; Np ÿ 1. From the necessary conditions imposed onthe highpass filter [8], we have
Pp�k� � 1 leaving Np ÿ 1 degrees
of freedom remaining for the prediction filter.
890 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001
. E. Jones, P. Runkle, N. Dasgupta, and L. Carin are with the Department ofElectrical and Computer Engineering, Duke University, Box 90291,Durham, NC 27708-0291. E-mail: [email protected].
. L. Couchman is with the Naval Research Laboratory, Physical Acoustics,Code 7130, Washington, DC 20375-5000.E-mail: [email protected].
Manuscript received 12 Apr. 2000; revised 30 Nov. 2000; accepted 9 Feb.2001.Recommended for acceptance by A. Kundu.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 111896.
0162-8828/01/$10.00 ß 2001 IEEE
Following the prediction step is an ªupdateº which augmentsthe even component of the analysis signal with a combination ofthe detail to obtain a coarse approximation to the original signal.The update operation utilizes a FIR filter, u, of order Nu:
cj�k� � cj�1�2k� �Xm
u�m�dj�m� kÿ n1�; �4�
where n1 � �Np �Nu � 2�=2 provides the proper temporal shift toalign the coarse coefficients in accord with the prediction andupdate filter orders. From (1) and (4), the h is a combination of theprediction and update filters:
h�2k� � ��kÿ n1� ÿXm
p�m�u�nÿm� h�2k� 1� � u�k�: �5�
A necessary condition on h, to satisfy the wavelet multiresolutionproperty, is a partitioning of unity [8]X
k
h�2k� �Xk
h�2k� 1�: �6�
This yields the necessary conditionPu�k� � 1=2. The number of
degrees of freedom available for the adaptive multirate filterdesign at each scale is Ndf � Np �Nu ÿ 2:
Within the lifting paradigm, there are a several possible designstrategies over the remaining Ndf parameters governing thebiorthogonal basis. One such strategy is to suppress all poly-nomials lower than order Np ÿ 1 at the output of g, and pass allpolynomials of order Nu ÿ 1 through the filter h. This approachmaximizes the smoothness of the coarse coefficients, whileenabling the detail coefficients to represent highpass information;such a design procedure yields the symmetric biorthogonal Cohen,Daubechies, and Faveau (CDF) wavelets [6]. In the lifting frame-work, these constraints are satisfied through the solution of linearequations over the space of p and u, while the constraints imposedon g and h are not as straightforward [1]. In lieu of applying allNdf degrees of freedom on smoothness constraints, one can usesome of the available parameters to match the biorthogonalwavelet to the signal(s) of interest. For example, in [4], some ofthe available parameters are employed to enforce smoothnessconstraints, while the remainder are used to match (in a least-square-error sense) the biorthogonal wavelet to signals of interest.
3 WAVELET FEATURES
3.1 Features
The principal focus of this paper involves lifting-based waveletconstruction via a GA design procedure, the latter employing a costfunction linked directly to the classification problem. As demon-strated in Section 4, the GA procedure is applicable to generalwavelet features and to a general classifier. Therefore, fordemonstration of the basic principles here, we employ simplewavelet features. In particular, our features are based on moments ofthe normalized wavelet coefficients. These features characterize theenvelope shape for a particular level (scale) of wavelet coefficients.Similar feature sets have been used in wavelet-based textureclassification systems [9], which represent the temporal/spatialstructure of the wavelet coefficients at each scale j. This approachprovides a compact representation of the wavelet coefficients and isrelatively robust to uncertainty in the training data. Also, while thewavelet transform itself is not shift-invariant, the shape of theenvelope remains relatively constant for arbitrary shifts.
Prior to feature extraction, the detail coefficients at scale j, dj�k�,are squared and normalized to form
wj�k� �d2j �k�zj
; zj �Xk
d2j �k�: �7�
The signal, wj�k� (denoted wj), is defined to satisfy the necessary
conditions of a probability mass function. The moments about the
mean of wj are given by
mrj �Xk
�kÿm1j�rwj�k� m1j �Xk
kwj�k�: �8�
The feature set used in this paper is composed of the variance
(breadth), skewness (asymmetry), and kurtosis (peakedness) of wj.
These parameters are derived from mrj; r � 2; 3; 4; j � 1; . . . ; J (we
typically only use features from L < J wavelet levels).
3.2 Statistical Model for Features
If we consider L wavelet levels, the moment-based featuresdiscussed above yields a 3L-dimensional feature vector, v. Thestatistical distribution of these feature vectors is characterized viavector quantization (VQ) trained using a K-means algorithm [10],with the features mapped onto integers corresponding to thenearest-neighbor codebook element: k � Q�vv�; k 2 1; . . .K, for aK-element codebook. In the work presented here, we are interestedin transient scattering from a general target, with such scatteringtypically a strong function of the target-sensor orientation [11].Therefore, we define target states [11], with each state character-istic of a set of contiguous target-sensor orientations for which thescattering is relatively stationary [11]. A VQ-based classifier isdesigned for each state, with this construct motivated by previousresearch, in which these states are employed in a hidden Markovmodel (HMM) [11].
Given target states Sm;m � 1; . . . ;M , we first define a
VQ codebook using feature vectors originating from all states.
Given this codebook, the feature vectors from state Sm are used to
define a state-dependent probability mass function for the code-
book elements p�Q�vv� � kjSm�, which we abbreviate as p�vvjSm�.VQ was selected over other statistical feature models (such as
Gaussian mixtures [12]) due to the algorithm's simplicity and the
fact that our training data (see Section 5) was relatively small. Prior
to application of the K-means algorithm, the features are normal-
ized such that a Euclidean-distance metric applied across the
heterogeneous feature dimensions is appropriate.
4 GENETIC ALGORITHM IMPLEMENTATION
Genetic algorithms (GAs) constitute an optimization technique
based on the ª survival of the fittest º paradigm found in nature.
The fundamentals of traditional GAs are well covered in [5]. Also,
[13], [14] cover a GA variant called genetic programming that is
relevant to the methods used here. Genetic algorithms work with
an abstract representation of a design called a chromosome. Here,
we choose a tree structure that is compatible with language-based
optimization [13], [14], [15], [16]. A classifier language describes the
architecture of the classifier. The dictionary of words, or lexicon, for
the language defines the components, subcomponents, and
numerical parameters necessary to build a wavelet-based classifier.
The language's grammar defines how these pieces are connected
together. Newly generated classifier designs must be grammatically
correct in order to be valid systems.
4.1 Classifier Language
The problem at hand assumes that we have M states Sm, each statedefined by an associated ensemble of transient scattered wave-forms. Each state is representative of a set of target-sensororientations over which the scattering physics is stationary.Feature vector vv is classified as associated with state Sm ifp�vvjSm� > p�vvjSk� 8 k 6� m (maximum-likelihood discrimination).Our goal is to design distinct wavelet filters for each state Sm, suchthat the likelihood of classifying a scattered waveform with thecorrect state is maximized. The cost function employed in the GA
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001 891
discussed below maximizes the minimum probability of correctclassification along the confusion-matrix diagonal [15].
While it can easily generalize toM states, the following grammar(see Fig. 1) defines how the individual components of a two-state
classifier fit within the GA design. A classifier is made up of twostate_identifiers and a maximum-likelihood processor, max. Each
state_identifier has two subcomponents, a feature_extractor, and astatistical model. The feature_extractor also has two parts, a
lifter_transform and the moment_features block. The statisticalmodel used is vector_quantization with 30 codebook elements.
ST ! classifier IDENTIFIERS LIKE PROCRIDENTIFIERS ! list join IDENTIFIER IDENTIFIERLIKE PROCR ! maxIDENTIFIER ! state identifier
EXTRACTOR STAT MODELEXTRACTOR ! feature extractor LIFTER FEATURESLIFTER ! lifter transform LIFTER STAGESFEATURES ! moment featuresSTAT MODEL ! DISCRETEDISCRETE ! vector quantization CODE COUNTCODE COUNT ! 30:
�9�Note that the list_join operator simply groups a set of objectstogether. Here, it groups two state_identifer objects. Later, it is
used to combine numbers into a list of filter coefficients.
In (9), which is written in Backus-Naur form [16], the! symbol
indicates transformation or substitution. Symbols on the left of the
arrow can be transformed into symbols on the right-hand side of the
arrow. The | symbol is the ªorº operator. It indicates that the left-
hand side symbol can be transformed into (replaced by) any of the
rules on the right-hand side. Uppercase symbols in the rules are
nonterminal symbols and lowercase bold symbols are terminal
symbols. The derived structure can only contain terminal symbols.
Nonterminal symbols are used to define intermediate steps in the
process of generating a valid sentence. Whenever a nonterminal is
transformed into a rule that itself contains a nonterminal(s), one of
the rules for that symbol is applied. All grammars have a start symbol
that indicates which transformation in the grammar to begin with
when generating a sentence. The start symbol here is ST.The grammar in (9) is very rigid in that it does not allow for
variations in the system, i.e., there is only a single choice for the
likelihood processor (max), the statistical model (vector_quantiza-
tion), and all the other components in the system. However, the
LIFTER_STAGES symbol has not yet been defined. The GA allows
variation in the number of lifter levels L (three or four), the length
of the p and u filters in each stage (four or five), and allows the p
and u filter coefficients to have any value in the range �ÿ1; 1�.1
LIFTER STAGES ! list join STAGE STAGE2STAGE2 ! list join STAGE STAGE3STAGE3 ! list join STAGE STAGE4 j STAGESTAGE4 ! STAGESTAGE ! lifter stage P UP ! COEFFICIENTSU ! COEFFICIENTSCOEFFICIENTS ! list join NUMBER COEFFICIENTS2COEFFICIENTS2 ! list join NUMBER COEFFICIENTS3COEFFICIENTS3 ! list join NUMBER COEFFICIENTS4COEFFICIENTS4 ! list join NUMBER COEFFICIENTS5 j
NUMBERCOEFFIENTS5 ! NUMBERNUMBER ! float�ÿ1;1�:
�10�Combining the grammars in (9) and (10) yields the complete
grammar for the classifier language. Fig. 1 shows one of the many
classifier chromosomes generated from the grammar. O'Neil and
Ryan [17] have considered a related approach, employing string
chromosomes rather than the tree chromosomes used here.
892 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001
Fig. 1. Chromosome for a two-state classifier. Gray tags indicate the grammatical type of each node. Information for the coefficient arrays for the p and u filters has beenomitted to conserve space.
1. While the GA can choose any numerical value for the p andu coefficients, the values are always postprocessed to enforce
Pp�k� � 1
andPu�k� � 1=2. This is done by subtracting the appropriate DC offset
from the filter coefficents.
4.2 Crossover for Tree Chromosomes
Breeding two chromosomes is done using the crossover operator. Incrossover, a node is selected from the trees of two parentchromosomes. These nodes, along with their complete subtree arethen swapped between the parents to form two new childrenchromosomes. In order for the children to be valid classifier systems,i.e., grammatically correct trees, crossover can only occur betweennodes that have the same grammatical type. There is a vast literatureon such crossovers, with the reader referred to [18], [19]. [20].
Every node within a tree is a possible crossover site. This can be
undesirable for several reasons. First, a large percentage of the nodes
in a tree are leaf nodes (half in full binary trees). As a result, a high
percentage of the crossover operations occur at leaf nodes and
exchange only a single node between the parents. Disallowing
crossover between grammatical types that only occur at leaf nodes
(such as NUMBER) forces crossovers to occur at higher level nodes
and increases the average amount of information exchanged
between parents. Second, there are portions of the tree chromosome
that are identical in all trees. For instance, because of the grammar
definition in (9), all STAT_MODEL subtrees are identical in every
state_identifier. Crossover at a STAT_MODEL typed node gen-
erates offspring that are clones of their parents. Removing such
nodes from the list of possible crossover sites increases the chance
that offspring have different designs than their parents.
As detailed in Section 5, a small percentage of the numerical
parameters are mutated each generation, using Gaussian mutation
with a standard deviation equal to a selected percentage of the
parameter's value [21].
5 EXAMPLE RESULTS
5.1 Preliminaries
We consider time-domain acoustic scattering from a submergedelastic target. The details of the measurement and of the target arefound in [22]. As discussed above, the backscattered signal fromsuch a target is a strong function of the target-sensor orientation.However, one can define states Sm over which the transientscattered signal is stationary [11] as a function of aspect. In thework presented here, we consider M � 4 states. Each scatteredwaveform is parsed via a biorthogonal wavelet transform and threemoments are computed for each wavelet level (scale), as discussedin Section 3.1. The likelihood p�vvjSk� is quantified here via K-meanvector quantization (VQ) [10] and as indicated in (9), we consider a30-element codebook. The codebook for each statistical model isgenerated using five noisy realizations of all four states, giving atotal of approximately 1; 000 training vectors. A distinct biortho-gonal wavelet is designed for each of the M � 4 states. Our goal isto design four wavelet filters, matched to the corresponding targetstate, that maximize classification performance.
Each GA chromosome is analyzed as follows:
1. A classifier system is built based on the blueprint found inthe chromosome.
2. The four state-dependent statistical models are trained torecognize their respective state, using five realizations ofthe noisy data set (as mentioned above, constitutingapproximately 1; 000 training vectors).
3. The classifier is tested using a new set of 15 noiserealizations.
4. The chromosome fitness is calculated. The chromosomefitness is a quantity that characterizes the quality of theindividual.
In this context, the performance of the classifier is characterized with
a confusion matrix. The confusion matrix can be reduced to fitness
value in a variety of ways. For example, one can use the average
classification rate. Alternatively, one can choose the worst classifica-
tion rate as the fitness value, effectively forcing all states to have an
acceptable classification rate. The chosen fitness function combines
the two approaches: Fitness � min�C� � a �mean�C�. Here,C is the
diagonal of the confusion matrix, and a is a weighting factor. A value
of a � 0:2 has proven to yield results that balance minimum
performance standards with high overall correct classification rates.
5.2 Genetic Algorithm Parameters
A steady-state GA [5] with a single population of 150 individuals
was evolved for 30 generations using a crossover rate of 80 percent
and a replacement rate of 90 percent. A small percentage (3 percent)
of the numerical parameters were mutated each generation using
Gaussian mutation with a standard deviation equal to 10 percent
of the parameters value. Mutation was not used to alter the tree
topology. The GA parameters were chosen based on previous
success in past applications [23]. The population size was chosen
based on computational resources. The GA was run on a cluster of
workstations that includes 32 Pentium III class processors running
at 500 MHz, with a single optimization run requiring approxi-
mately eight hours. While the population size is small for the
application, it has proven capable of producing good results.
5.3 Classification Performance
Each of the four states was characterized by approximately
50 backscattered waveforms, with the waveforms corresponding
to 1 degree angular sampling rate (variable target-sensor orienta-
tion). As discussed above, the GA employs noisy data for design
with white Gaussian noise (WGN) added to the noise-free transient
scattered waveforms. Example scattered waveforms from each of
the four states are shown in Fig. 2. The different characteristics of
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001 893
Fig. 2. Representative signals from the four states of a particular target.
these waveforms underscores the variability of the scattered
waveforms with aspect. Moreover, we note that the scattered-field
energy is also state dependent, making it difficult to define a
composite signal-to-noise ratio (SNR) for all states. In particular,
while the zero-mean WGN has a fixed standard deviation, the
scattered signal strength is state dependent. Consequently, all
results below are quantified in terms of the noise standard
deviation, rather than SNR. The noise standard deviation
employed in the GA design was � = 0.0185. As a reference, the
average SNR with � = 0.0185 for states 1 through 4 are 20.04 dB,
17.05 dB, 21.17 dB, and 24.65 dB, respectively, where SNR is
calculated as follows (with y�i� the samples of a waveform):
SNR �Pni�0 y�i�2= �2. Using noisy data within the GA cost
function implicitly forces the biorthogonal wavelets to be robust
to noise, without explicitly enforcing smoothness constraints [4].
The performance of the GA-designed wavelets is compared to
that of three alternative wavelet designs. Each of these alternatives
uses four levels (L � 4) of detail coefficients, p and u filters of
length four, and VQ state-dependent statistical models employing
a 30-element codebook. In the first design, no attempt is made to
adapt the wavelet to the data and the identifiers for all states use
the CDF bio-orthogonal wavelet [6]. The other two sets of
biorthogonal wavelets employ the design procedure developed
by Claypoole et al. [4]. This method also employs lifting, but the
design criterion is minimization of the error in the prediction
filter p. Here, the procedure in [4] is applied to the scattered
waveforms in each state, from which state-dependent wavelets are
derived. Moreover, as employed in the GA and as in [4], distinct
wavelets are designed at each wavelet level (scale). Following the
work in [4], we considered wavelets designed when one and two
parameters of p are dedicated to matching the data, while the
remaining p and all of the u parameters are dedicated to
smoothness constraints. The classification-based cost function is
too complicated to be solved as in [4], necessitating the GA.
In Fig. 3, we plot the worst-state classification rate (from the
four states considered), with state-dependent VQ classifiers based
on CDF wavelets, two classes of wavelets designed using the
technique in [4], and the GA-designed wavelets. In all cases, the
feature vector employed in the state-dependent VQ classifier was
based on feature vectors composed of the aforementioned wavelet
moments. Results are plotted as a function of the added-noise
standard deviation �. Note that the GA-designed wavelets were
designed for � � 0:0185, while their performance is tested over a
relatively wide range of noise standard deviations �. The wavelets
were designed by the respective methods discussed above. After
deriving the wavelet filters, the results in Fig. 3 were computed
based on a subsequent training of the state-dependent
VQ classifiers using 16 noise realizations, and testing on 16 distinct
noise realizations. The error bars indicate the standard deviation of
these computations. The GA clearly outperforms the other three
designs. We also considered applying the method of [4], which is
based on minimizing the error in the wavelet representation, with
more components of p and u dedicated to signal matching, rather
than smoothness. For these cases, we saw no improvement in
classification performance. As might be expected, this indicates
that the goal of designing wavelets to improve classification is best
realized if this design cost function is directly linked to classifica-
tion, as it is for the GA.
It is also of interest to examine the spectral characteristics of the
GA-derived wavelets vis-aÁ-vis the more-traditional wavelet de-
signs. In Fig. 4, we plot the spectral characteristics of the detail and
coarse filters (h and g from Section 2), for the CDF wavelet and for
the GA-designed wavelet for a particular state. The GA designs
distinct wavelet filters for each wavelet level (scale), and here the
comparison is given only for the first level. The GA-designed
wavelet is based on the goal of achieving overall improved
classification and, therefore, it is difficult to explain the detailed
distinction between the CDF and GA-designed wavelets. Never-
theless, the significantly improved classification performance in
Fig. 3 is apparently accrued by the design of wavelets that are
markedly different from traditional (CDF) wavelets.
6 CONCLUSION
We have employed language-based genetic algorithms (GAs) for
the design of biorthogonal wavelets within the context of a lifting
paradigm [3], demonstrating that the GA-designed wavelets,
which directly impose a classification-based fitness function,
significantly outperform other wavelets based on more traditional
design constructs. These results underscore the importance of
894 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001
Fig. 3. Performance of classifiers shown as a function of the worst classification rate for a single vs. the noise level. The results ªenergy 1º and ªenergy 2º correspond to
design as in [4] using one and two parameters of p to fit the data.
explicitly imposing the desired objective in the wavelet design.
While this is expected, in practice complicated cost functions are
generally difficult to optimize. We have presented a genetic design
construct that makes such wavelet design relatively straightfor-
ward. Such a procedure could be applied to other wavelet cost
functions of interest, such as entropy minimization.
REFERENCES
[1] A.H. Tewfik, D. Sinha, and P. Jorgensen, ªOn the Optimal Choice of aWavelet for Signal Representation,º IEEE Trans. Information Theory, vol. 38,pp. 747-765, Mar. 1992.
[2] M. Oslick, I.R. Linscott, S. Maslakovic, and J.D. Twicken, ªA GeneralApproach to the Generation of Biorthogonal Bases of Compactly-SupportedWavelets,º Proc. IEEE Int'l Conf. Acoustics and Signal Processing (ICASP),pp. 1537-1540, 1998.
[3] W. Sweldens, ªThe Lifting Scheme: A Custom-Design Construction ofBiorthogonal Wavelets,º J. Applied Computational and Harmonic Analysis,vol. 3, pp. 186-200, 1996.
[4] R.L. Claypoole, R.G. Baraniuk, and R.D. Nowak, ªAdaptive WaveletTransforms via Lifting,º Proc. IEEE Int'l Conf. Acoustics and Signal Processing(ICASP), 1998.
[5] D.E. Goldberg, Genetic Algorithms. New York: Addision-Wesley, 1989.[6] A. Cohen, I. Daubechies, and J. Feauveau, ªBiorthogonal Bases of
Compactly Supported Wavelets,º Comm. Pure Applied Math., vol. 45,pp. 485-560, 1992.
[7] S.G. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1998.[8] W. Sweldens, ªThe Lifting Scheme: A Custom-Design Construction of
Biorthogonal Wavelets,º J. Applied Computational and Harmonic Analysis,vol. 3, pp. 186-200, 1996.
[9] J. Chen and A. Kundu, ªRotation and Gray Scale Transform InvariantTexture Identification Using Wavelet Decomposition and Hidden MarkovModel,º IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 2,pp. 208-214, Feb. 1994.
[10] Y. Linde, A. Buzo, and R.M. Gray, ªAn Algorithm for Vector QuantizerDesign,º IEEE Trans. Comm, vol. 28, pp. 84-95, Jan. 1980.
[11] P.R. Runkle, P.K. Bharadwaj, L. Couchman, and L. Carin, ªHidden MarkovModels for Multiaspect Target Classification,º IEEE Trans. Signal Processing,vol. 47, pp. 2035-2040, July 1999.
[12] S.E. Levinson, ªContinuously Variable Duration Hidden Markov Modelsfor Automatic Speech Recognition,ª Computers, Speech, and Language, vol. 1,pp. 29-45, Mar. 1986.
[13] J. Koza, Genetic Programming: On the Programming of Computers by Means ofNatural Selection. Cambridge: MIT Press, 1992.
[14] J. Koza, Genetic Programming II: Automatic Discovery of Reusable Programs.Cambridge: MIT Press, 1994.
[15] D.H. Kil and F.B. Shin, Pattern Recognition and Prediction with Applications toSignal Characterization, Woodbury, N.Y.: Am. Inst. of Physics, 1996.
[16] P. Naur, ªRevised Report on the Algorithmic Language ALGOL 60,º Comm.ACM, vol. 6, no. 1, pp. 1-17, 1963.
[17] M. O'Neill and C. Ryan, ªUnder the Hood of Grammatical Evolution,º Proc.of the Genetic and Evolutionary Computation Conf., W. Banzhaf, J. Daida,A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, and R.E. Smith, eds.,pp. 1143-1148, 1999.
[18] R. Poli and W.B. Langdon, ªSchema Theory for Genetic Programming withOne-Point Crossover and Point Mutation,º Evolutionary Computation, vol. 6,no. 3, pp. 231-252, 1998.
[19] P.J. Angeline, ªAn Investigation into the Sensitivity of Genetic Program-ming to the Frequency of Leaf Selection During Subtree Crossover,º Proc.First Ann. Conf. Genetic Programming, J.R. Koza, D.E. Goldberg, D.B. Fogel,and, R.L. Riolo, eds., pp. 21-29, July 1996.
[20] P.J. Angeline, ªSubtree Crossover: Building Block Engine or Macromuta-tion?º Proc. Second Ann. Conf. Genetic Programming, J.R. Koza, K. Deb, M.Dorigo, D.B. Fogel, M. Garzon, H. Iba, and R.L. Riolo, eds., pp. 9-17, July1997.
[21] T. Back, F. Hoffmeister, and H. Schwefel, ªA Survey of EvolutionStrategies,º Proc. Fourth Int'l Conf. Genetic Algorithms, pp. 2-9, July 1991.
[22] P. Runkle, L. Carin, L. Couchman, J.A. Bucaro, and T.J. Yoder, ªMultiaspectIdentification of Submerged Elastic Targets via Wave-Based MatchingPursuits and Hidden Markov Models,º J. Acoustical Soc. Am., pp. 605-616,Aug. 1999.
[23] E.A. Jones, ªGenetic Design of Antennas and Electronic Circuits,º PhDdissertation, Duke Univ., 1999.
. For further information on this or any computing topic, please visit ourDigital Library at http://computer.org/publications/dlib.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001 895
Fig. 4. Coarse and detail filter responses for the first level of a typical signal adapted wavelet compared to that of a CDF wavelet.