6
Short Papers___________________________________________________________________________________________________ Genetic Algorithm Wavelet Design for Signal Classification Eric Jones, Member, IEEE, Paul Runkle, Member, IEEE, Nilanjan Dasgupta, Student Member, IEEE, Luise Couchman, and Lawrence Carin, Fellow, IEEE Abstract—Biorthogonal wavelets are applied to parse multiaspect transient scattering data in the context of signal classification. A language-based genetic algorithm is used to design wavelet filters that enhance classification performance. The biorthogonal wavelets are implemented via the lifting procedure and the optimization is carried out using a classification-based cost function. Example results are presented for target classification using measured scattering data. Index Terms—Genetic algorithms, wavelets, classification. æ 1 INTRODUCTION WAVELET design is a problem that has attracted significant attention over the last decade. Tewfik et al. [1] designed orthogonal wavelets based on computing bounds on cost functions, the latter based on either minimizing the error between the original and approximate signal representation, or on maximizing the norm of the projection of the signal onto the wavelet space. Oslick et al. [2] have developed a paradigm for the design of general biorthogonal wavelets with this construct amenable to a cost function. These and other wavelet design paradigms are based on frequency-domain filter characteristics. As an alternative, Sweldens [3] has developed a general scheme for designing biorthogonal wavelets, implemen- ted directly in the time domain. This formalism, termed “lifting,” yields a simple technique for insertion into a general cost function. For example, Claypoole et al. [4] have developed several techniques for adaptive wavelet design based on lifting. In [4], an efficient design solution was based on a linearly constrained least-squares minimization of the signal-representation error on each wavelet level. While the wavelet design techniques in [1], [4] have clear utility in the context of compression, there are other applications for which alternative cost functions are desirable. In this paper, we are interested in signal classification based on wavelet-based feature parsing. While error minimization of the wavelet representation may be a salutary goal, it does not address the ultimate objective of improved classification performance. The optimal choice of wavelets for signal classification depends on the details of the signal classes and on the classifier. In many cases, the classifier is too complicated to allow a direct solution for the optimal wavelet representation, suggesting the use of a genetic algorithm (GA) [5] for cost-function optimization. This approach is pursued in this paper and the classification performance of the GA-designed wavelets is compared to that of classical (Cohen et al. [6] wavelets, as well as to wavelets designed by minimizing the error in the wavelet representation [4]. It is important to note that the cost function employed in [4] allowed a direct solution, without the need for a GA. However, that cost function does not permit one to address the ultimate goal of improved classification. The cost function introduced here is based explicitly on classifier perfor- mance, this not permitting a direct solution for the optimal wavelets. Therefore, we have employed this classification-based cost function in a GA. 2 LIFTING-BASED WAVELET DESIGN The discrete wavelet transform (DWT) is computed efficiently via a recursive multirate filterbank [7]. At each scale j, the filterbank decomposes the signal into low-pass and high-pass components through convolution (and subsequent decimation) with FIR filters h and g, respectively. The DWT representation is composed of scaling coefficients, ck, representing coarse or low-pass signal information at scale j 0, and wavelet coefficients, d j k, representing signal detail at scales j 1; ... ;J . Formally, c j k X m h2k mc j1 m d j k X m g2k mc j1 m; 1 where the original discrete-time signal is given by c J k, of length 2 J samples. At the jth level, both c j k and d j k are composed of 2 j samples, forming a tree-like relationship between the coefficients at successive scales. Signal reconstruction may be effected through the application of the inverse DWT [7]. Although standard wavelet families are well-suited for analysis of general signals, it is also possible to design wavelet transforms that are adapted to the signals of interest. Rather than use an orthogonal wavelet basis, we choose biorthogonal wavelets which allow more flexibility in the system design. Among the advantages of a biorthogonal system are that the filters h and g (in (1)) need not be of the same length, so that the parameters may be repartitioned to meet specific design constraints. In this paper, we exploit an alternative architecture to the multirate analysis filterbank, known as the lifting scheme [8]. It has been shown that any set of wavelet and scaling filters, including those associated with a biorthogonal bases, may be decomposed in terms of the lifting structure, which is a lattice-type realization of the multirate filterbank [8]. At each scale, the lifting scheme is implemented in the following manner: The signal under analysis, c j1 k, is split into its odd and even components c j1 2k and c j1 2k 1, analogous to the decimation operation in the standard multirate filterbank. A FIR filter, p, of order N p is used to predict the odd components as a linear combination of the even. The wavelet coefficients may be identified as the “detail” in the higher resolution data that is not predicted by the even component of the analysis signal: d j k c j1 2k 1 X m pmc j1 2m k n o ; 2 where n o N p 2=2 is a temporal shift to properly align the wavelet coefficients according to the prediction filter order. From (1) and (2), g is expressed in terms of the coefficients of p g2kpk g2k 1 k n o 3 with k 0; ... ;N p 1. From the necessary conditions imposed on the highpass filter [8], we have P pk 1 leaving N p 1 degrees of freedom remaining for the prediction filter. 890 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001 . E. Jones, P. Runkle, N. Dasgupta, and L. Carin are with the Department of Electrical and Computer Engineering, Duke University, Box 90291, Durham, NC 27708-0291. E-mail: [email protected]. . L. Couchman is with the Naval Research Laboratory, Physical Acoustics, Code 7130, Washington, DC 20375-5000. E-mail: [email protected]. Manuscript received 12 Apr. 2000; revised 30 Nov. 2000; accepted 9 Feb. 2001. Recommended for acceptance by A. Kundu. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 111896. 0162-8828/01/$10.00 ß 2001 IEEE

Genetic Algorithm Wavelet Design for Signal Classification

Embed Size (px)

Citation preview

Page 1: Genetic Algorithm Wavelet Design for Signal Classification

Short Papers___________________________________________________________________________________________________

Genetic Algorithm Wavelet Designfor Signal Classification

Eric Jones, Member, IEEE,Paul Runkle, Member, IEEE,

Nilanjan Dasgupta, Student Member,IEEE, Luise Couchman, and

Lawrence Carin, Fellow, IEEE

AbstractÐBiorthogonal wavelets are applied to parse multiaspect transient

scattering data in the context of signal classification. A language-based genetic

algorithm is used to design wavelet filters that enhance classification performance.

The biorthogonal wavelets are implemented via the lifting procedure and the

optimization is carried out using a classification-based cost function. Example

results are presented for target classification using measured scattering data.

Index TermsÐGenetic algorithms, wavelets, classification.

æ

1 INTRODUCTION

WAVELET design is a problem that has attracted significantattention over the last decade. Tewfik et al. [1] designed orthogonalwavelets based on computing bounds on cost functions, the latterbased on either minimizing the error between the original andapproximate signal representation, or on maximizing the norm ofthe projection of the signal onto the wavelet space. Oslick et al. [2]have developed a paradigm for the design of general biorthogonalwavelets with this construct amenable to a cost function. These andother wavelet design paradigms are based on frequency-domainfilter characteristics. As an alternative, Sweldens [3] has developeda general scheme for designing biorthogonal wavelets, implemen-ted directly in the time domain. This formalism, termed ªlifting,ºyields a simple technique for insertion into a general cost function.For example, Claypoole et al. [4] have developed severaltechniques for adaptive wavelet design based on lifting. In [4],an efficient design solution was based on a linearly constrainedleast-squares minimization of the signal-representation error oneach wavelet level.

While the wavelet design techniques in [1], [4] have clear utilityin the context of compression, there are other applications forwhich alternative cost functions are desirable. In this paper, we areinterested in signal classification based on wavelet-based featureparsing. While error minimization of the wavelet representationmay be a salutary goal, it does not address the ultimate objective ofimproved classification performance. The optimal choice ofwavelets for signal classification depends on the details of thesignal classes and on the classifier. In many cases, the classifier istoo complicated to allow a direct solution for the optimal waveletrepresentation, suggesting the use of a genetic algorithm (GA) [5]for cost-function optimization. This approach is pursued in this

paper and the classification performance of the GA-designedwavelets is compared to that of classical (Cohen et al. [6] wavelets,as well as to wavelets designed by minimizing the error in thewavelet representation [4]. It is important to note that the costfunction employed in [4] allowed a direct solution, without theneed for a GA. However, that cost function does not permit one toaddress the ultimate goal of improved classification. The costfunction introduced here is based explicitly on classifier perfor-mance, this not permitting a direct solution for the optimalwavelets. Therefore, we have employed this classification-basedcost function in a GA.

2 LIFTING-BASED WAVELET DESIGN

The discrete wavelet transform (DWT) is computed efficiently via arecursive multirate filterbank [7]. At each scale j, the filterbankdecomposes the signal into low-pass and high-pass componentsthrough convolution (and subsequent decimation) with FIR filtersh and g, respectively. The DWT representation is composed ofscaling coefficients, c�k�, representing coarse or low-pass signalinformation at scale j � 0, and wavelet coefficients, dj�k�,representing signal detail at scales j � 1; . . . ; J . Formally,

cj�k� �Xm

h�2kÿm�cj�1�m� dj�k� �Xm

g�2kÿm�cj�1�m�; �1�

where the original discrete-time signal is given by cJ �k�, of length2J samples. At the jth level, both cj�k� and dj�k� are composed of 2j

samples, forming a tree-like relationship between the coefficientsat successive scales. Signal reconstruction may be effected throughthe application of the inverse DWT [7].

Although standard wavelet families are well-suited for analysisof general signals, it is also possible to design wavelet transformsthat are adapted to the signals of interest. Rather than use anorthogonal wavelet basis, we choose biorthogonal wavelets whichallow more flexibility in the system design. Among the advantagesof a biorthogonal system are that the filters h and g (in (1)) neednot be of the same length, so that the parameters may berepartitioned to meet specific design constraints. In this paper,we exploit an alternative architecture to the multirate analysisfilterbank, known as the lifting scheme [8]. It has been shown thatany set of wavelet and scaling filters, including those associatedwith a biorthogonal bases, may be decomposed in terms of thelifting structure, which is a lattice-type realization of the multiratefilterbank [8].

At each scale, the lifting scheme is implemented in thefollowing manner: The signal under analysis, cj�1�k�, is split intoits odd and even components cj�1�2k� and cj�1�2k� 1�, analogousto the decimation operation in the standard multirate filterbank. AFIR filter, p, of order Np is used to predict the odd components as alinear combination of the even. The wavelet coefficients may beidentified as the ªdetailº in the higher resolution data that is notpredicted by the even component of the analysis signal:

dj�k� � cj�1�2k� 1� ÿXm

p�m�cj�1�2m� kÿ no�; �2�

where no � �Np ÿ 2�=2 is a temporal shift to properly align thewavelet coefficients according to the prediction filter order. From(1) and (2), g is expressed in terms of the coefficients of p

g�2k� � ÿp�k� g�2k� 1� � ��kÿ no� �3�with k � 0; . . . ; Np ÿ 1. From the necessary conditions imposed onthe highpass filter [8], we have

Pp�k� � 1 leaving Np ÿ 1 degrees

of freedom remaining for the prediction filter.

890 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001

. E. Jones, P. Runkle, N. Dasgupta, and L. Carin are with the Department ofElectrical and Computer Engineering, Duke University, Box 90291,Durham, NC 27708-0291. E-mail: [email protected].

. L. Couchman is with the Naval Research Laboratory, Physical Acoustics,Code 7130, Washington, DC 20375-5000.E-mail: [email protected].

Manuscript received 12 Apr. 2000; revised 30 Nov. 2000; accepted 9 Feb.2001.Recommended for acceptance by A. Kundu.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 111896.

0162-8828/01/$10.00 ß 2001 IEEE

Page 2: Genetic Algorithm Wavelet Design for Signal Classification

Following the prediction step is an ªupdateº which augmentsthe even component of the analysis signal with a combination ofthe detail to obtain a coarse approximation to the original signal.The update operation utilizes a FIR filter, u, of order Nu:

cj�k� � cj�1�2k� �Xm

u�m�dj�m� kÿ n1�; �4�

where n1 � �Np �Nu � 2�=2 provides the proper temporal shift toalign the coarse coefficients in accord with the prediction andupdate filter orders. From (1) and (4), the h is a combination of theprediction and update filters:

h�2k� � ��kÿ n1� ÿXm

p�m�u�nÿm� h�2k� 1� � u�k�: �5�

A necessary condition on h, to satisfy the wavelet multiresolutionproperty, is a partitioning of unity [8]X

k

h�2k� �Xk

h�2k� 1�: �6�

This yields the necessary conditionPu�k� � 1=2. The number of

degrees of freedom available for the adaptive multirate filterdesign at each scale is Ndf � Np �Nu ÿ 2:

Within the lifting paradigm, there are a several possible designstrategies over the remaining Ndf parameters governing thebiorthogonal basis. One such strategy is to suppress all poly-nomials lower than order Np ÿ 1 at the output of g, and pass allpolynomials of order Nu ÿ 1 through the filter h. This approachmaximizes the smoothness of the coarse coefficients, whileenabling the detail coefficients to represent highpass information;such a design procedure yields the symmetric biorthogonal Cohen,Daubechies, and Faveau (CDF) wavelets [6]. In the lifting frame-work, these constraints are satisfied through the solution of linearequations over the space of p and u, while the constraints imposedon g and h are not as straightforward [1]. In lieu of applying allNdf degrees of freedom on smoothness constraints, one can usesome of the available parameters to match the biorthogonalwavelet to the signal(s) of interest. For example, in [4], some ofthe available parameters are employed to enforce smoothnessconstraints, while the remainder are used to match (in a least-square-error sense) the biorthogonal wavelet to signals of interest.

3 WAVELET FEATURES

3.1 Features

The principal focus of this paper involves lifting-based waveletconstruction via a GA design procedure, the latter employing a costfunction linked directly to the classification problem. As demon-strated in Section 4, the GA procedure is applicable to generalwavelet features and to a general classifier. Therefore, fordemonstration of the basic principles here, we employ simplewavelet features. In particular, our features are based on moments ofthe normalized wavelet coefficients. These features characterize theenvelope shape for a particular level (scale) of wavelet coefficients.Similar feature sets have been used in wavelet-based textureclassification systems [9], which represent the temporal/spatialstructure of the wavelet coefficients at each scale j. This approachprovides a compact representation of the wavelet coefficients and isrelatively robust to uncertainty in the training data. Also, while thewavelet transform itself is not shift-invariant, the shape of theenvelope remains relatively constant for arbitrary shifts.

Prior to feature extraction, the detail coefficients at scale j, dj�k�,are squared and normalized to form

wj�k� �d2j �k�zj

; zj �Xk

d2j �k�: �7�

The signal, wj�k� (denoted wj), is defined to satisfy the necessary

conditions of a probability mass function. The moments about the

mean of wj are given by

mrj �Xk

�kÿm1j�rwj�k� m1j �Xk

kwj�k�: �8�

The feature set used in this paper is composed of the variance

(breadth), skewness (asymmetry), and kurtosis (peakedness) of wj.

These parameters are derived from mrj; r � 2; 3; 4; j � 1; . . . ; J (we

typically only use features from L < J wavelet levels).

3.2 Statistical Model for Features

If we consider L wavelet levels, the moment-based featuresdiscussed above yields a 3L-dimensional feature vector, v. Thestatistical distribution of these feature vectors is characterized viavector quantization (VQ) trained using a K-means algorithm [10],with the features mapped onto integers corresponding to thenearest-neighbor codebook element: k � Q�vv�; k 2 1; . . .K, for aK-element codebook. In the work presented here, we are interestedin transient scattering from a general target, with such scatteringtypically a strong function of the target-sensor orientation [11].Therefore, we define target states [11], with each state character-istic of a set of contiguous target-sensor orientations for which thescattering is relatively stationary [11]. A VQ-based classifier isdesigned for each state, with this construct motivated by previousresearch, in which these states are employed in a hidden Markovmodel (HMM) [11].

Given target states Sm;m � 1; . . . ;M , we first define a

VQ codebook using feature vectors originating from all states.

Given this codebook, the feature vectors from state Sm are used to

define a state-dependent probability mass function for the code-

book elements p�Q�vv� � kjSm�, which we abbreviate as p�vvjSm�.VQ was selected over other statistical feature models (such as

Gaussian mixtures [12]) due to the algorithm's simplicity and the

fact that our training data (see Section 5) was relatively small. Prior

to application of the K-means algorithm, the features are normal-

ized such that a Euclidean-distance metric applied across the

heterogeneous feature dimensions is appropriate.

4 GENETIC ALGORITHM IMPLEMENTATION

Genetic algorithms (GAs) constitute an optimization technique

based on the ª survival of the fittest º paradigm found in nature.

The fundamentals of traditional GAs are well covered in [5]. Also,

[13], [14] cover a GA variant called genetic programming that is

relevant to the methods used here. Genetic algorithms work with

an abstract representation of a design called a chromosome. Here,

we choose a tree structure that is compatible with language-based

optimization [13], [14], [15], [16]. A classifier language describes the

architecture of the classifier. The dictionary of words, or lexicon, for

the language defines the components, subcomponents, and

numerical parameters necessary to build a wavelet-based classifier.

The language's grammar defines how these pieces are connected

together. Newly generated classifier designs must be grammatically

correct in order to be valid systems.

4.1 Classifier Language

The problem at hand assumes that we have M states Sm, each statedefined by an associated ensemble of transient scattered wave-forms. Each state is representative of a set of target-sensororientations over which the scattering physics is stationary.Feature vector vv is classified as associated with state Sm ifp�vvjSm� > p�vvjSk� 8 k 6� m (maximum-likelihood discrimination).Our goal is to design distinct wavelet filters for each state Sm, suchthat the likelihood of classifying a scattered waveform with thecorrect state is maximized. The cost function employed in the GA

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001 891

Page 3: Genetic Algorithm Wavelet Design for Signal Classification

discussed below maximizes the minimum probability of correctclassification along the confusion-matrix diagonal [15].

While it can easily generalize toM states, the following grammar(see Fig. 1) defines how the individual components of a two-state

classifier fit within the GA design. A classifier is made up of twostate_identifiers and a maximum-likelihood processor, max. Each

state_identifier has two subcomponents, a feature_extractor, and astatistical model. The feature_extractor also has two parts, a

lifter_transform and the moment_features block. The statisticalmodel used is vector_quantization with 30 codebook elements.

ST ! classifier IDENTIFIERS LIKE PROCRIDENTIFIERS ! list join IDENTIFIER IDENTIFIERLIKE PROCR ! maxIDENTIFIER ! state identifier

EXTRACTOR STAT MODELEXTRACTOR ! feature extractor LIFTER FEATURESLIFTER ! lifter transform LIFTER STAGESFEATURES ! moment featuresSTAT MODEL ! DISCRETEDISCRETE ! vector quantization CODE COUNTCODE COUNT ! 30:

�9�Note that the list_join operator simply groups a set of objectstogether. Here, it groups two state_identifer objects. Later, it is

used to combine numbers into a list of filter coefficients.

In (9), which is written in Backus-Naur form [16], the! symbol

indicates transformation or substitution. Symbols on the left of the

arrow can be transformed into symbols on the right-hand side of the

arrow. The | symbol is the ªorº operator. It indicates that the left-

hand side symbol can be transformed into (replaced by) any of the

rules on the right-hand side. Uppercase symbols in the rules are

nonterminal symbols and lowercase bold symbols are terminal

symbols. The derived structure can only contain terminal symbols.

Nonterminal symbols are used to define intermediate steps in the

process of generating a valid sentence. Whenever a nonterminal is

transformed into a rule that itself contains a nonterminal(s), one of

the rules for that symbol is applied. All grammars have a start symbol

that indicates which transformation in the grammar to begin with

when generating a sentence. The start symbol here is ST.The grammar in (9) is very rigid in that it does not allow for

variations in the system, i.e., there is only a single choice for the

likelihood processor (max), the statistical model (vector_quantiza-

tion), and all the other components in the system. However, the

LIFTER_STAGES symbol has not yet been defined. The GA allows

variation in the number of lifter levels L (three or four), the length

of the p and u filters in each stage (four or five), and allows the p

and u filter coefficients to have any value in the range �ÿ1; 1�.1

LIFTER STAGES ! list join STAGE STAGE2STAGE2 ! list join STAGE STAGE3STAGE3 ! list join STAGE STAGE4 j STAGESTAGE4 ! STAGESTAGE ! lifter stage P UP ! COEFFICIENTSU ! COEFFICIENTSCOEFFICIENTS ! list join NUMBER COEFFICIENTS2COEFFICIENTS2 ! list join NUMBER COEFFICIENTS3COEFFICIENTS3 ! list join NUMBER COEFFICIENTS4COEFFICIENTS4 ! list join NUMBER COEFFICIENTS5 j

NUMBERCOEFFIENTS5 ! NUMBERNUMBER ! float�ÿ1;1�:

�10�Combining the grammars in (9) and (10) yields the complete

grammar for the classifier language. Fig. 1 shows one of the many

classifier chromosomes generated from the grammar. O'Neil and

Ryan [17] have considered a related approach, employing string

chromosomes rather than the tree chromosomes used here.

892 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001

Fig. 1. Chromosome for a two-state classifier. Gray tags indicate the grammatical type of each node. Information for the coefficient arrays for the p and u filters has beenomitted to conserve space.

1. While the GA can choose any numerical value for the p andu coefficients, the values are always postprocessed to enforce

Pp�k� � 1

andPu�k� � 1=2. This is done by subtracting the appropriate DC offset

from the filter coefficents.

Page 4: Genetic Algorithm Wavelet Design for Signal Classification

4.2 Crossover for Tree Chromosomes

Breeding two chromosomes is done using the crossover operator. Incrossover, a node is selected from the trees of two parentchromosomes. These nodes, along with their complete subtree arethen swapped between the parents to form two new childrenchromosomes. In order for the children to be valid classifier systems,i.e., grammatically correct trees, crossover can only occur betweennodes that have the same grammatical type. There is a vast literatureon such crossovers, with the reader referred to [18], [19]. [20].

Every node within a tree is a possible crossover site. This can be

undesirable for several reasons. First, a large percentage of the nodes

in a tree are leaf nodes (half in full binary trees). As a result, a high

percentage of the crossover operations occur at leaf nodes and

exchange only a single node between the parents. Disallowing

crossover between grammatical types that only occur at leaf nodes

(such as NUMBER) forces crossovers to occur at higher level nodes

and increases the average amount of information exchanged

between parents. Second, there are portions of the tree chromosome

that are identical in all trees. For instance, because of the grammar

definition in (9), all STAT_MODEL subtrees are identical in every

state_identifier. Crossover at a STAT_MODEL typed node gen-

erates offspring that are clones of their parents. Removing such

nodes from the list of possible crossover sites increases the chance

that offspring have different designs than their parents.

As detailed in Section 5, a small percentage of the numerical

parameters are mutated each generation, using Gaussian mutation

with a standard deviation equal to a selected percentage of the

parameter's value [21].

5 EXAMPLE RESULTS

5.1 Preliminaries

We consider time-domain acoustic scattering from a submergedelastic target. The details of the measurement and of the target arefound in [22]. As discussed above, the backscattered signal fromsuch a target is a strong function of the target-sensor orientation.However, one can define states Sm over which the transientscattered signal is stationary [11] as a function of aspect. In thework presented here, we consider M � 4 states. Each scatteredwaveform is parsed via a biorthogonal wavelet transform and threemoments are computed for each wavelet level (scale), as discussedin Section 3.1. The likelihood p�vvjSk� is quantified here via K-meanvector quantization (VQ) [10] and as indicated in (9), we consider a30-element codebook. The codebook for each statistical model isgenerated using five noisy realizations of all four states, giving atotal of approximately 1; 000 training vectors. A distinct biortho-gonal wavelet is designed for each of the M � 4 states. Our goal isto design four wavelet filters, matched to the corresponding targetstate, that maximize classification performance.

Each GA chromosome is analyzed as follows:

1. A classifier system is built based on the blueprint found inthe chromosome.

2. The four state-dependent statistical models are trained torecognize their respective state, using five realizations ofthe noisy data set (as mentioned above, constitutingapproximately 1; 000 training vectors).

3. The classifier is tested using a new set of 15 noiserealizations.

4. The chromosome fitness is calculated. The chromosomefitness is a quantity that characterizes the quality of theindividual.

In this context, the performance of the classifier is characterized with

a confusion matrix. The confusion matrix can be reduced to fitness

value in a variety of ways. For example, one can use the average

classification rate. Alternatively, one can choose the worst classifica-

tion rate as the fitness value, effectively forcing all states to have an

acceptable classification rate. The chosen fitness function combines

the two approaches: Fitness � min�C� � a �mean�C�. Here,C is the

diagonal of the confusion matrix, and a is a weighting factor. A value

of a � 0:2 has proven to yield results that balance minimum

performance standards with high overall correct classification rates.

5.2 Genetic Algorithm Parameters

A steady-state GA [5] with a single population of 150 individuals

was evolved for 30 generations using a crossover rate of 80 percent

and a replacement rate of 90 percent. A small percentage (3 percent)

of the numerical parameters were mutated each generation using

Gaussian mutation with a standard deviation equal to 10 percent

of the parameters value. Mutation was not used to alter the tree

topology. The GA parameters were chosen based on previous

success in past applications [23]. The population size was chosen

based on computational resources. The GA was run on a cluster of

workstations that includes 32 Pentium III class processors running

at 500 MHz, with a single optimization run requiring approxi-

mately eight hours. While the population size is small for the

application, it has proven capable of producing good results.

5.3 Classification Performance

Each of the four states was characterized by approximately

50 backscattered waveforms, with the waveforms corresponding

to 1 degree angular sampling rate (variable target-sensor orienta-

tion). As discussed above, the GA employs noisy data for design

with white Gaussian noise (WGN) added to the noise-free transient

scattered waveforms. Example scattered waveforms from each of

the four states are shown in Fig. 2. The different characteristics of

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001 893

Fig. 2. Representative signals from the four states of a particular target.

Page 5: Genetic Algorithm Wavelet Design for Signal Classification

these waveforms underscores the variability of the scattered

waveforms with aspect. Moreover, we note that the scattered-field

energy is also state dependent, making it difficult to define a

composite signal-to-noise ratio (SNR) for all states. In particular,

while the zero-mean WGN has a fixed standard deviation, the

scattered signal strength is state dependent. Consequently, all

results below are quantified in terms of the noise standard

deviation, rather than SNR. The noise standard deviation

employed in the GA design was � = 0.0185. As a reference, the

average SNR with � = 0.0185 for states 1 through 4 are 20.04 dB,

17.05 dB, 21.17 dB, and 24.65 dB, respectively, where SNR is

calculated as follows (with y�i� the samples of a waveform):

SNR �Pni�0 y�i�2= �2. Using noisy data within the GA cost

function implicitly forces the biorthogonal wavelets to be robust

to noise, without explicitly enforcing smoothness constraints [4].

The performance of the GA-designed wavelets is compared to

that of three alternative wavelet designs. Each of these alternatives

uses four levels (L � 4) of detail coefficients, p and u filters of

length four, and VQ state-dependent statistical models employing

a 30-element codebook. In the first design, no attempt is made to

adapt the wavelet to the data and the identifiers for all states use

the CDF bio-orthogonal wavelet [6]. The other two sets of

biorthogonal wavelets employ the design procedure developed

by Claypoole et al. [4]. This method also employs lifting, but the

design criterion is minimization of the error in the prediction

filter p. Here, the procedure in [4] is applied to the scattered

waveforms in each state, from which state-dependent wavelets are

derived. Moreover, as employed in the GA and as in [4], distinct

wavelets are designed at each wavelet level (scale). Following the

work in [4], we considered wavelets designed when one and two

parameters of p are dedicated to matching the data, while the

remaining p and all of the u parameters are dedicated to

smoothness constraints. The classification-based cost function is

too complicated to be solved as in [4], necessitating the GA.

In Fig. 3, we plot the worst-state classification rate (from the

four states considered), with state-dependent VQ classifiers based

on CDF wavelets, two classes of wavelets designed using the

technique in [4], and the GA-designed wavelets. In all cases, the

feature vector employed in the state-dependent VQ classifier was

based on feature vectors composed of the aforementioned wavelet

moments. Results are plotted as a function of the added-noise

standard deviation �. Note that the GA-designed wavelets were

designed for � � 0:0185, while their performance is tested over a

relatively wide range of noise standard deviations �. The wavelets

were designed by the respective methods discussed above. After

deriving the wavelet filters, the results in Fig. 3 were computed

based on a subsequent training of the state-dependent

VQ classifiers using 16 noise realizations, and testing on 16 distinct

noise realizations. The error bars indicate the standard deviation of

these computations. The GA clearly outperforms the other three

designs. We also considered applying the method of [4], which is

based on minimizing the error in the wavelet representation, with

more components of p and u dedicated to signal matching, rather

than smoothness. For these cases, we saw no improvement in

classification performance. As might be expected, this indicates

that the goal of designing wavelets to improve classification is best

realized if this design cost function is directly linked to classifica-

tion, as it is for the GA.

It is also of interest to examine the spectral characteristics of the

GA-derived wavelets vis-aÁ-vis the more-traditional wavelet de-

signs. In Fig. 4, we plot the spectral characteristics of the detail and

coarse filters (h and g from Section 2), for the CDF wavelet and for

the GA-designed wavelet for a particular state. The GA designs

distinct wavelet filters for each wavelet level (scale), and here the

comparison is given only for the first level. The GA-designed

wavelet is based on the goal of achieving overall improved

classification and, therefore, it is difficult to explain the detailed

distinction between the CDF and GA-designed wavelets. Never-

theless, the significantly improved classification performance in

Fig. 3 is apparently accrued by the design of wavelets that are

markedly different from traditional (CDF) wavelets.

6 CONCLUSION

We have employed language-based genetic algorithms (GAs) for

the design of biorthogonal wavelets within the context of a lifting

paradigm [3], demonstrating that the GA-designed wavelets,

which directly impose a classification-based fitness function,

significantly outperform other wavelets based on more traditional

design constructs. These results underscore the importance of

894 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001

Fig. 3. Performance of classifiers shown as a function of the worst classification rate for a single vs. the noise level. The results ªenergy 1º and ªenergy 2º correspond to

design as in [4] using one and two parameters of p to fit the data.

Page 6: Genetic Algorithm Wavelet Design for Signal Classification

explicitly imposing the desired objective in the wavelet design.

While this is expected, in practice complicated cost functions are

generally difficult to optimize. We have presented a genetic design

construct that makes such wavelet design relatively straightfor-

ward. Such a procedure could be applied to other wavelet cost

functions of interest, such as entropy minimization.

REFERENCES

[1] A.H. Tewfik, D. Sinha, and P. Jorgensen, ªOn the Optimal Choice of aWavelet for Signal Representation,º IEEE Trans. Information Theory, vol. 38,pp. 747-765, Mar. 1992.

[2] M. Oslick, I.R. Linscott, S. Maslakovic, and J.D. Twicken, ªA GeneralApproach to the Generation of Biorthogonal Bases of Compactly-SupportedWavelets,º Proc. IEEE Int'l Conf. Acoustics and Signal Processing (ICASP),pp. 1537-1540, 1998.

[3] W. Sweldens, ªThe Lifting Scheme: A Custom-Design Construction ofBiorthogonal Wavelets,º J. Applied Computational and Harmonic Analysis,vol. 3, pp. 186-200, 1996.

[4] R.L. Claypoole, R.G. Baraniuk, and R.D. Nowak, ªAdaptive WaveletTransforms via Lifting,º Proc. IEEE Int'l Conf. Acoustics and Signal Processing(ICASP), 1998.

[5] D.E. Goldberg, Genetic Algorithms. New York: Addision-Wesley, 1989.[6] A. Cohen, I. Daubechies, and J. Feauveau, ªBiorthogonal Bases of

Compactly Supported Wavelets,º Comm. Pure Applied Math., vol. 45,pp. 485-560, 1992.

[7] S.G. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1998.[8] W. Sweldens, ªThe Lifting Scheme: A Custom-Design Construction of

Biorthogonal Wavelets,º J. Applied Computational and Harmonic Analysis,vol. 3, pp. 186-200, 1996.

[9] J. Chen and A. Kundu, ªRotation and Gray Scale Transform InvariantTexture Identification Using Wavelet Decomposition and Hidden MarkovModel,º IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 2,pp. 208-214, Feb. 1994.

[10] Y. Linde, A. Buzo, and R.M. Gray, ªAn Algorithm for Vector QuantizerDesign,º IEEE Trans. Comm, vol. 28, pp. 84-95, Jan. 1980.

[11] P.R. Runkle, P.K. Bharadwaj, L. Couchman, and L. Carin, ªHidden MarkovModels for Multiaspect Target Classification,º IEEE Trans. Signal Processing,vol. 47, pp. 2035-2040, July 1999.

[12] S.E. Levinson, ªContinuously Variable Duration Hidden Markov Modelsfor Automatic Speech Recognition,ª Computers, Speech, and Language, vol. 1,pp. 29-45, Mar. 1986.

[13] J. Koza, Genetic Programming: On the Programming of Computers by Means ofNatural Selection. Cambridge: MIT Press, 1992.

[14] J. Koza, Genetic Programming II: Automatic Discovery of Reusable Programs.Cambridge: MIT Press, 1994.

[15] D.H. Kil and F.B. Shin, Pattern Recognition and Prediction with Applications toSignal Characterization, Woodbury, N.Y.: Am. Inst. of Physics, 1996.

[16] P. Naur, ªRevised Report on the Algorithmic Language ALGOL 60,º Comm.ACM, vol. 6, no. 1, pp. 1-17, 1963.

[17] M. O'Neill and C. Ryan, ªUnder the Hood of Grammatical Evolution,º Proc.of the Genetic and Evolutionary Computation Conf., W. Banzhaf, J. Daida,A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, and R.E. Smith, eds.,pp. 1143-1148, 1999.

[18] R. Poli and W.B. Langdon, ªSchema Theory for Genetic Programming withOne-Point Crossover and Point Mutation,º Evolutionary Computation, vol. 6,no. 3, pp. 231-252, 1998.

[19] P.J. Angeline, ªAn Investigation into the Sensitivity of Genetic Program-ming to the Frequency of Leaf Selection During Subtree Crossover,º Proc.First Ann. Conf. Genetic Programming, J.R. Koza, D.E. Goldberg, D.B. Fogel,and, R.L. Riolo, eds., pp. 21-29, July 1996.

[20] P.J. Angeline, ªSubtree Crossover: Building Block Engine or Macromuta-tion?º Proc. Second Ann. Conf. Genetic Programming, J.R. Koza, K. Deb, M.Dorigo, D.B. Fogel, M. Garzon, H. Iba, and R.L. Riolo, eds., pp. 9-17, July1997.

[21] T. Back, F. Hoffmeister, and H. Schwefel, ªA Survey of EvolutionStrategies,º Proc. Fourth Int'l Conf. Genetic Algorithms, pp. 2-9, July 1991.

[22] P. Runkle, L. Carin, L. Couchman, J.A. Bucaro, and T.J. Yoder, ªMultiaspectIdentification of Submerged Elastic Targets via Wave-Based MatchingPursuits and Hidden Markov Models,º J. Acoustical Soc. Am., pp. 605-616,Aug. 1999.

[23] E.A. Jones, ªGenetic Design of Antennas and Electronic Circuits,º PhDdissertation, Duke Univ., 1999.

. For further information on this or any computing topic, please visit ourDigital Library at http://computer.org/publications/dlib.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 8, AUGUST 2001 895

Fig. 4. Coarse and detail filter responses for the first level of a typical signal adapted wavelet compared to that of a CDF wavelet.