Automatic Ground-Truth Validation With Genetic Algorithms for Multispectral Image Classification

  • Published on

  • View

  • Download

Embed Size (px)



    Automatic Ground-Truth Validation With GeneticAlgorithms for Multispectral Image Classification

    Noureddine Ghoggali, Student Member, IEEE, and Farid Melgani, Senior Member, IEEE

    AbstractIn this paper, we propose a novel method that aimsat assisting the ground-truth expert through an automatic de-tection of potentially mislabeled learning samples. This methodis based on viewing the mislabeled sample detection issue asan optimization problem where it is looked for the best subsetof learning samples in terms of statistical separability betweenclasses. This problem is formulated within a genetic optimizationframework, where each chromosome represents a candidate so-lution for validating/invalidating the learning samples collectedby the ground-truth expert. The genetic optimization process isguided by the joint optimization of two different criteria whichare the maximization of a between-class statistical distance and theminimization of the number of invalidated samples. Experimentsconducted on both simulated and real data sets show that the pro-posed ground-truth validation method succeeds in the following:1) in detecting the mislabeled samples with a high accuracy, evenwhen up to 30% of the learning samples are mislabeled, and 2) instrongly limiting the negative impact of the mislabeling issue onthe accuracy of the classification process.

    Index TermsGenetic algorithms (GAs), ground-truth valida-tion, JeffriesMatusita (JM) distance measure, mislabeling issue,multiobjective optimization.


    THE TYPICAL goal of an inductive learning algorithm isto build discriminant functions from part of the availableground-truth samples (training set) so that the generalizationcapability of the resulting classifier on previously unseen sam-ples is as high as possible. The quantification of the general-ization capability is usually performed on another part of theground-truth samples, termed as test set. Most of the workson automatic classification have focused efforts on improvingthe accuracy (generalization capability) of the classificationprocess by acting mainly on the following three levels: 1) datarepresentation; 2) discriminant function model; and 3) criterionon the basis of which the discriminant functions are optimized[1]. These works are, however, based on an essential assump-tion that is the ground-truth samples are of unquestionablequality. In this paper, we will put this assumption under lightand show that the accuracy of a classification process (whateverthe kind of classifier used) critically depends on the quality ofthe adopted ground-truth.

    Manuscript received June 7, 2008; revised October 18, 2008 and January 2,2009. First published March 27, 2009; current version published June 19, 2009.

    The authors are with the Department of Information Engineering and Com-puter Science, University of Trento, 38050, Trento, Italy (e-mail:;

    Color versions of one or more of the figures in this paper are available onlineat

    Digital Object Identifier 10.1109/TGRS.2009.2013693

    The two well-known ground-truth collection approaches areas follows: 1) in situ observation approach and 2) photo-interpretation approach [2]. Each of them has its own advan-tages and drawbacks, but both are subject to errors in thelabeling process. In the first approach, this may occur becauseof georeferencing problems, while in the second one, spectralmismatching errors by the human analyst are the main source ofproblems. Since the presence of mislabeling problems (noise)in a learning (training and test) set has a direct negative impacton the classification process, the development of automatictechniques for validating the collected learning samples is, inour opinion, crucial.

    To the best of our knowledge, in the literature, very scarce at-tention has been paid for coping with this issue, which is mainlyfaced through two different strategies. The first one, whichadmits anyway the presence of noise (mislabeling problems) inthe data, consists in designing a sophisticated classifier whichis less likely to be influenced by this presence [3]. The secondstrategy is based on the removal of suspect samples fromthe learning set. An early work derived from this strategy fork-nearest neighbor (kNN) classification suggested first to applya 3NN classification over the whole learning set and then to re-move misclassified samples in order to produce a new learningset on the basis of which a 1NN classifier is formed for the clas-sification phase [4]. In [5], in order to avoid overfitting on noisysamples, the author proposed to perform the removal (filtering)process through the C4.5 decision tree classifier. In [6], the sus-pect samples are identified and removed from the learning setby means of an ensemble of three classifiers (i.e., C4.5, kNN,and linear classifiers). In particular, a sample is expected to bemislabeled if it is misclassified by the ensemble of classifiers.

    In this paper, we propose an alternative method that aims atinteracting with the ground-truth expert by providing him/herwith a binary information of the kind validated/invalidatedfor each learning sample. For each invalidated sample, theexpert may confirm or not the invalidation and thus corrector maintain the adopted labeling before creating the finallearning set that will be exploited in the classification process.Our ground-truth validation method is based on viewing themislabeled sample detection issue as an optimization problemwhere it is looked for the best subset of learning samples interms of statistical separability between classes. This problemis formulated within a genetic optimization framework forits capability to solve complex pattern recognition issues [7],[8]. In particular, each chromosome is configured as a binarystring, which represents a candidate solution for validating/invalidating the available learning samples. The genetic opti-mization process is guided by the joint optimization of two

    0196-2892/$25.00 2009 IEEE


    Fig. 1. Sketch illustrating the proposed ground-truth validation process.

    different criteria which are the maximization of a between-class statistical distance and the minimization of the numberof invalidated samples. The former is expressed in terms ofthe JeffriesMatusita (JM) distance measure [1], [2]. The latterallows one to get at convergence a Pareto front from whichthe ground-truth expert can select the best solution accordingto his/her prior confidence on the reliability of the collectedground-truth.

    Experiments were conducted on both simulated data sets andreal remote sensing images. The obtained results reveal that theproposed automatic validation method succeeds in detecting themislabeled samples with a high accuracy, even when up to 30%of the learning samples are mislabeled. Moreover, we showhow the removal of the detected mislabeled samples impactsvery positively on the accuracy of different classifiers, namely,the support vector machine (SVM), the kNN, and the radialbasis function (RBF) neural network [1], [9][15]. This papercomplements and integrates partial results presented in [16].

    The remaining part of this paper is organized as follows.In Section II, we recall the basic idea of the multiobjectivenondominated sorting genetic algorithm (NSGA-II) and de-scribe the proposed automatic ground-truth validation method.Experimental results obtained on simulated and real data setsare reported in Sections III and IV, respectively. Finally, con-clusions are drawn in Section V.


    A. Problem Formulation

    Let us consider a learning set L composed of n sampleslabeled by the ground-truth expert such that L = {(xi, yi), i =1, 2, . . . , n}, where each xi d represents a vector of dremote observations or/and processed features and yi ={1 = 1, 2 = 2, . . . , T = T} is the corresponding class la-bel. Our objective is to detect in an automatic way whichof these n learning samples are potentially mislabeled and to

    provide the ground-truth expert with a binary information ofthe kind validated/invalidated for each learning sample.Note that we do not aim at correcting the labels of mislabeledsamples. The label correction work shall be carried out by theground-truth expert (Fig. 1).

    A naive approach to this problem would consist in try-ing all possible combinations of validated/invalidated learningsamples and then choosing the best one according to somepredefined criterion. This appears, however, computationallyprohibitive, and thus an impractical solution, even for smallvalues of n since the total number of possible combinations isequal to 2n. Therefore, the only solution at hand is to adopt anumerical optimizer to look for the hopefully best solution inthe binary solution space. In this paper, we propose to carryout this task by means of a multiobjective genetic optimizationmethod. In the following sections, we first recall the basics ofgenetic algorithms (GAs). Then, after describing its two maincomponents (i.e., the chromosome and the fitness function), weexplain the different phases of the proposed genetic solution.

    B. General Concepts on GAs

    GAs are general purpose randomized optimization tech-niques which exploit principles inspired from biological sys-tems [17], [18]. A genetic optimization algorithm performs asearch by evolving a population of candidate solutions (indi-viduals) modeled with chromosomes. From one generationto the next, the population is improved by mechanisms derivedfrom genetics, i.e., through the use of both deterministic andnondeterministic genetic operators. The most common form ofGAs involves the following steps. First, an initial populationof chromosomes is randomly generated. Then, the goodness ofeach chromosome is evaluated according to a predefined fitnessfunction representing the considered objective function. Thisfitness evaluation step allows one to keep the best chromosomesand reject the worst ones by using an appropriate selection rule


    Fig. 2. Illustration of the chromosome structure and its effect on the learning sample distribution.

    based on the principle that the better the fitness, the higherthe chance of being selected. Once the selection process iscompleted, the next step is devoted to reproducing the popu-lation. This is done by genetic operators such as crossover andmutation operators. The entire process is iterated until a user-defined convergence criterion is reached.

    Several multiobjective GA-based approaches have been pro-posed in the literature [19]. In this paper, we will adopt theNSGA-II for its low computational requirements and its abilityto distribute uniformly the solutions along the Pareto front [8],[20]. It is based on the concept of Pareto dominance. A solutions1 is said to dominate another solution s2, if s1 is not worse thans2 in all objectives and better than s2 in at least one objective.A solution is said to be nondominated if it is not dominatedby any other solution. The algorithm starts by generating arandom parent population. Individuals (chromosomes) selectedthrough a crowded tournament selection undergo crossoverand mutation operations to form an offspring population. Bothoffspring and parent populations are then combined and sortedinto fronts of decreasing dominance (rank). After the sortingprocess, the new population is filled with solutions of differentfronts starting from the best one. If a front can only partially fillthe next generation, crowded tournament selection is used againto ensure diversity. Once the next generation population hasbeen filled, the algorithm loops back to create a new offspringpopulation and the process continues up to convergence.

    C. GA Setup

    The success of a genetic optimization process dependsmainly on two ingredients, i.e., the chromosome structure andthe fitness functions, which translate the considered optimiza-tion problem and guide the search toward the best solution,respectively.

    Concerning the first ingredient, since we desire either vali-dating or invalidating each of the available n learning samples,we will consider a population of N chromosomes Cm(m =1, 2, . . . , N) where each chromosome Cm {0, 1}n is a binaryvector of length equal to n encoding a candidate combinationof validations and invalidations of the learning samples. Asshown in Fig. 2, a gene taking the value 1 or 0 meansthe invalidation or validation of the corresponding sample,respectively.

    The validation/invalidation procedure will be based on thehypothesis that mislabeling a learning sample potentially leadsto an increase of the intraclass variability and thus to a decreaseof the between-class distance. Therefore, as a first fitness func-tion, we will make use of a between-class statistical distancebased on the well-known JM distance measure [1], [2]. Thismeasure is a function of the Bhattacharyya distance measurewhich is derived from the Chernoff bound, i.e., an upper boundof the probability of error of the Bayes classifier. In the caseof multivariate Gaussian distributions, the JM distance betweentwo generic classes i and j is given by

    JMij =

    2(1 eBij ) (1)

    where Bij is the Bhattacharyya distance defined as

    Bij =18(i j)T

    [ (i +j)2


    (i j) +12ln





    i ||

    j |(2)


    and denote the class covariance matrix and meanvector, respectively. The symbol |.| stands for the determi-nant operator. The JM distance is a measure bounded by theinterval [0,

    2]. When the two classes are identical (and,

    thus, completely overlapped), it assumes the zero value. Incontrast, if they are totally separated, it takes the value


    The assumption that classes follow a Gaussian distribution ismainly motivated by the need to derive a tractable and easy-to-implement between-class distance measure. It is, however,noteworthy that the general nature of the proposed approachmakes it possible to adopt any other type of distance measure.

    At this point, in order to be suitably guided, the geneticoptimization process needs an information from the ground-truth expert, i.e., the expected amount of mislabeled learningsamples. Without this information, the process would tend to in-validate all the learning samples but two (the most distant ones),i.e., one for each class. With this information, we could envisionrunning a constrained genetic optimization process, which atconvergence would provide the best subset of invalidated sam-ples with prespecified cardinality. The main drawback of thisgenetic implementation is that it requires an exact knowledge


    of the amount of mislabeled learning samples. As a morepractical alternative, we propose to run a multiobjective geneticoptimization process based on the NSGA-II where the secondfitness function would simpl...


View more >