18
This article was downloaded by: [Qian Du] On: 29 July 2012, At: 00:44 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Geocarto International Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tgei20 Hyperspectral band clustering and band selection for urban land cover classification Hongjun Su a & Qian Du b a School of Earth Sciences and Engineering, Hohai University, Nanjing, China b Department of Electrical and Computer Engineering, Mississippi State University, Starkville, USA Accepted author version posted online: 29 Nov 2011. Version of record first published: 12 Jan 2012 To cite this article: Hongjun Su & Qian Du (2012): Hyperspectral band clustering and band selection for urban land cover classification, Geocarto International, 27:5, 395-411 To link to this article: http://dx.doi.org/10.1080/10106049.2011.643322 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and- conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Hyperspectral band clustering and band selection for urban ...denoted as SKMd-BR. A band may be deleted with a similarity metric. In this article, we adopt OPD (Chang 2003), which

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • This article was downloaded by: [Qian Du]On: 29 July 2012, At: 00:44Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

    Geocarto InternationalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tgei20

    Hyperspectral band clustering andband selection for urban land coverclassificationHongjun Su a & Qian Du ba School of Earth Sciences and Engineering, Hohai University,Nanjing, Chinab Department of Electrical and Computer Engineering, MississippiState University, Starkville, USA

    Accepted author version posted online: 29 Nov 2011. Version ofrecord first published: 12 Jan 2012

    To cite this article: Hongjun Su & Qian Du (2012): Hyperspectral band clustering and band selectionfor urban land cover classification, Geocarto International, 27:5, 395-411

    To link to this article: http://dx.doi.org/10.1080/10106049.2011.643322

    PLEASE SCROLL DOWN FOR ARTICLE

    Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

    This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.

    The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

    http://www.tandfonline.com/loi/tgei20http://dx.doi.org/10.1080/10106049.2011.643322http://www.tandfonline.com/page/terms-and-conditionshttp://www.tandfonline.com/page/terms-and-conditions

  • Hyperspectral band clustering and band selection for urban land coverclassification

    Hongjun Sua and Qian Dub*

    aSchool of Earth Sciences and Engineering, Hohai University, Nanjing, China; bDepartment ofElectrical and Computer Engineering, Mississippi State University, Starkville, USA

    (Received 1 August 2011; final version received 18 November 2011)

    The aim of this study is to combine band clustering with band selection fordimensionality reduction of hyperspectral imagery. The performance ofdimensionality reduction is evaluated through urban land cover classificationaccuracy with the dimensionality-reduced data. Different from unsupervisedclustering using all the pixels or supervised clustering requiring labelled pixels, thediscussed semi-supervised band clustering needs class spectral signatures only;band selection result is used as initial condition for band clustering; afterclustering, a cluster selection step is applied to select clusters to be used in thefollowing data analysis. In this article, we propose to conduct band selection byremoving outlier bands in each cluster before finalizing cluster centres. Theexperimental results in urban land cover classification show that the proposedalgorithm can further enhance support vector machine (SVM)-based classificationaccuracy.

    Keywords: hyperspectral imagery; dimensionality reduction; band clustering;band selection; urban land cover classification

    1. Introduction

    A hyperspectral imaging sensor collects hundreds of spectral bands with very finespectral resolution for the same area on the earth. Its abundant spectral informationprovides the potential of accurate object classification and identification. However,its vast data volume brings about problems in data transmission and storage. Inparticular, the very high data dimensionality presents a challenge to many traditionalimage analysis algorithms. Dimensionality reduction of hyperspectral imagery isoften achieved by band selection, whose objective is to find a small subset of bandscontaining important data information. Another approach is band grouping or bandclustering. For instance, adjacent bands can be grouped together and arepresentative of each group can be selected to participate in the following dataanalysis. Intuitively, adjacent bands can be partitioned uniformly or based onspectral correlation coefficient. Figure 1 shows a 126 6 126 spectral correlationcoefficient matrix, where a bright pixel at location (i, j) means high correlationbetween the i-th and j-th bands; if the pixel is in dark, then the correlation is low. Thewhite blocks along the diagonal line indicate that adjacent bands usually have high

    *Corresponding author. Email: [email protected]

    Geocarto International

    Vol. 27, No. 5, August 2012, 395–411

    ISSN 1010-6049 print/ISSN 1752-0762 online

    � 2012 Taylor & Francishttp://dx.doi.org/10.1080/10106049.2011.643322

    http://www.tandfonline.com

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • correlation and should be grouped together. However, non-adjacent bands may alsohave high correlation in Figure 1, as indicated by the presence of white blocks in off-diagonal areas. Thus, non-adjacent bands should be allowed to be grouped together.While most research in the literature copes with band selection, our approach usingband clusters can provide better results.

    The typical implementation of clustering is to cluster pixels based on theirspectral signatures so as to spatially segment an image scene into many sub-regions.Band clustering is another type of implementation in the spatial domain; in otherwords, a spectral band is converted into a vector after column- or row-stacking, thenthese band vectors are clustered into several groups based on their similarity. InMartı́nez-Usó et al. (2007), two clustering methods, i.e. Ward’s linkage strategyusing mutual information (WaLuMI) and Ward’s linkage strategy using divergence(WaLuDi), were developed, and finalized clusters were used for band selection.

    In our research, we will focus on k-means-based band clustering fordimensionality reduction (Mojaradi et al. 2008). One of its drawbacks is that it issensitive to initial condition and may be trapped in local optima; different initialconditions may produce different clusters. We have proposed a new initial techniqueusing band selection output (Su et al. 2011). Because of unsupervised nature, k-means clustering may be time-consuming when using all the pixels. In Al-Harbi andRayward-Smith (2006), k-means was extended to a supervised version, wheretraining samples for each class were required for clustering. However, in practice, itmay be difficult to obtain enough training samples; instead, it may be possible tohave a spectral signature for each class.

    Therefore, we proposed a semi-supervised k-means clustering method that usesclass signatures only (a class signature is the representative spectrum of a class) in Suet al. (2011). Instead of using the band closest to the cluster centre, we use bandcluster centres for the following data analysis (e.g. detection and classification). Wehave shown that using cluster centres is better than using selected bands. We alsoconducted cluster selection, and showed that deleting the worst cluster providedbetter performance. Initial condition is critical to the clustering performance and ourband selection result can be used as initials (Du and Yang 2008).

    In this article, we propose to conduct band selection by removing outlier bands ineach cluster before finalizing cluster centres. The resulting cluster centres can better

    Figure 1. Spectral correlation coefficient matrix for a 126-band HyMap image.

    396 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • represent the key spectral features in corresponding clusters, thereby furtherenhancing classification performance. Outlier bands are determined by a similaritymetric, such as Euclidean distance (EUD), Mahalanobis distance (MD), spectralangle mapper (SAM), or spectral information divergence (SID) (Chang 2000). Here,we adopt orthogonal projection divergence (OPD) (Chang 2003), which shows thebest performance for the hyperspectral digital imagery collection experiment(HYDICE) and HyMap urban images in our experiment.

    2. Methodology

    2.1 Band clustering

    Given a set of bands (B1, . . . , Bl, . . . , BL), where each band is arranged into N-dimensional vector where N is the number of pixels. k-means band clustering aims topartition the L bands into k clusters C¼ {C1, . . . , Cm, . . . , Ck} (1�m� k) so as tominimize the following objective function:

    argminC

    Xkm¼1

    XBl2Cm

    D Bl; mmð Þ ð1Þ

    where lm is the cluster centre of Cm and D(. , .) is a distance metric gauging thesimilarity between a band and the centre of the cluster it is assigned to. Itscomputational complexity is linearly proportional to the number of pixels N. Inorder to reduce the complexity, we use class signatures as algorithm input, then thecomplexity becomes linearly proportional to the number of signatures S (S�N).This approach is denoted as semi-supervised k-means (SKM). When only the bandclosest to the cluster centre is used in the following analysis, the resulting method isdenoted as SKM(BS).

    The SKM algorithm is initialized by using distinctive bands as cluster centroids.The idea of unsupervisedly selecting distinctive bands was presented (Du and Yang2008). The band selection algorithm is initialized by choosing a pair of bands B1 andB2, leading to a band subset F¼ {B1, B2}; it then finds a third band B3 that is themost dissimilar to all the bands in the current F by using a certain criterion, resultingin an updated subset F¼ {F [ B3}; the selection step is repeated until the number ofbands in F is large enough. Here, linear prediction (LP) error (i.e. the differencebetween an original band and its linear predicted version using bands in F) isemployed as the similarity metric. A band with the maximum LP error is the mostdissimilar band from those in F and should be selected.

    After k-means clustering, k clusters with their centroids are ready for furtheranalysis. However, it does not mean that all of them should be used. Some clustersmay not be helpful for object classification, and they may even bring aboutconfusion. Thus, we propose to remove a cluster by exhaustively searching for theworst one (when it is removed, the remaining clusters provided the most similarclassification maps to those from using all the original bands). It is observed thatdeleting one cluster usually results in improvement, but deleting more than onecluster may not necessarily provide further improvement. Thus, only one cluster isremoved hereafter. The SKM algorithm deleting the worst cluster is denoted asSKMd.

    Geocarto International 397

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • 2.2 Outlier band removal (BR)Within a band cluster, the contribution from each band to class separation isdifferent. For instance, bands far away from the cluster centre may be considered asoutlier or an anomalous band, whose spectral features are quite different from otherbands in the same cluster. Based on our experience, such outlier bands shoud beremoved and the cluster centre should be recaluated with the remaining bands. Usingthe final cluster centres can improve the performance. The resulting algorithm isdenoted as SKMd-BR.

    A band may be deleted with a similarity metric. In this article, we adopt OPD(Chang 2003), which is based on the concept of orthogonal subspace projection(Harsanyi and Chang 1994). Let cj denote the j-th cluster centroid and bij the i-thband in j-th cluster. Their OPD value is defined as

    OPD bij; cj� �

    ¼ bTijP?cjbij þ cTj P?bijcj

    � �1=2ð2Þ

    where P?cm ¼ I� cmðcTmcmÞcTm for m¼ ij, j, and I is an identity matrix P?cj is the

    orthogonal subspace of cj and bTijP?cjbij is the squared norm of the projection of bij onto

    P?cj . Similarly, cTj P?bijcj is the squared norm of the projection of cj onto P

    ?bij. A larger

    OPD value means bij and cj are more different, which means bij may be an outlier.

    2.3 The proposed algorithm

    The proposed SKMd-BR algorithm can be detailed as below.

    (1) Initialize the algorithm by using k selected distinctive bands.(2) With the known class signatures, conduct k-means band clustering. The

    clustering is completed when no band is shuffled from one cluster to another.Compute band cluster centriods by averaging all the bands clustered.

    (3) OPD is employed to compute pair-wise cluster similarity. The cluster with thelargest average OPD will be removed. The resulting k71 clusters are the finalband clustering result.

    (4) Calculate the OPD value between each band and its cluster centriod. A certainpercentage of bands with large OPD values are removed. The k71 clustercentriods are updated with the remaining bands, which are the final outputs.

    3. Experiments

    Two real-data experiments were conducted. Clustering quality was evaluated withclassification accuracy. When training and test samples are available, support vectormachine (SVM) can be applied (Burges 1998). The libSVM library was used (http://www.csie.ntu.edu.tw/cjlin/libsvm/) for this research. The proposed SKMd-BR wascompared against SKMd and SKM(BS) since SKMd could outperform other bandclustering methods in our previous work (Su et al. 2011).

    3.1 Hyperspectral digital imagery collection experiment

    Data collected by the airborne HYDICE sensor were used, which covers 0.4–2.5 mmspectral coverage with 210 bands and 10 nm spectral resolution. The subimage scene

    398 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

    http://www.csie.ntu.edu.tw/cjlin/libsvm/http://www.csie.ntu.edu.tw/cjlin/libsvm/

  • with 304 6 301 pixels over the Washington DC Mall area with about 2.8 m spatialresolution was shown in Figure 2. After bad band removal (BR), 191 bands wereused. Six classes are present in this image scene: roof, tree, grass, water, road andtrail. These six class centres were used for band clustering. The overall accuracy (OA)from SVM was computed with training and test samples listed in Table 1.

    As shown in Figure 3, SKMd-BR provided the best results when k was changedfrom 5 to 15, where SKMd-BR(10%) indicates 10% of bands were removed fromeach cluster. Figure 4 shows the performance variation when different similaritymetrics were adopted for SKMd-BR. Obviously, OPD yielded the best results.Table 2 listed the accuracy when 5%, 10% and 15% of bands were removed, whereno conclusion could be drawn about which percentage was the best. However, theoverall performance discrepancy was not critical.

    Figure 5 presents classification maps when using six bands or clusters. There weresignificant amount of misclassificaiton between trail (in yellow) and roof (in orange).

    Figure 2. The image scene used in HYDICE (bands 47 (0.63 mm), 35 (0.55 mm), 15(0.45 mm)).

    Geocarto International 399

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • With six selected bands, SKMd(BS) could slightly reduce the yellow (trail) areas thatwere supposed to be in orange as roof (as highlighted in two circles), but SKMdusing cluster centres could significantly reduce the yellow (trail) areas. SKMd-BRfurther enlarged the orange (roof) areas but not signaficantly. Tables 3–6 areconfusion matrices for the four methods, which more clearly showed theimprovement in class separation, particularly from SKMd-BR.

    Table 1. Training and testing samples used in HYDICE experiment.

    Training Testing

    Road 55 892Grass 57 910Trail 50 567Tree 46 624Shadow 49 656Roof 52 1123Total 309 4772

    Figure 3. Classification accuracy in HYDICE (SKMd-BR with OPD).

    Figure 4. Different similarity metrics used for SKMd-BR in HYDICE.

    400 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • Table

    2.

    Classificationaccuracy

    vs.thepercentageofbandsremoved

    ineach

    cluster

    inHYDIC

    Eexperim

    ent.

    56

    78

    910

    11

    12

    13

    14

    15

    SKMd

    0.9530

    0.9570

    0.9468

    0.9422

    0.9377

    0.9392

    0.9434

    0.9449

    0.9432

    0.9466

    0.9466

    SKMd-BR(5%)

    0.9560

    0.9612

    0.9585

    0.9528

    0.9486

    0.9539

    0.9568

    0.9516

    0.9577

    0.9541

    0.9442

    SKMd-BR(10%)

    0.9579

    0.9608

    0.9575

    0.9533

    0.9476

    0.9541

    0.9568

    0.9528

    0.9579

    0.9537

    0.9436

    SKMd-BR(15%)

    0.9600

    0.9598

    0.9570

    0.9537

    0.9480

    0.9547

    0.9583

    0.9539

    0.9577

    0.9530

    0.9440

    Geocarto International 401

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • 3.2 HyMap experiment

    Figure 6 shows a 126-band airborne HyMap (with 0.45–2.48 mm spectral coverageand about 16 nm spectral resolution) data that were acquired from a residential areanear the campus of Purdue University in 1999. The image size is 377 6 512. Thespatial resolution is about 5 m. The image scene includes six classes: {road, grass,shadow, soil, tree, roof}. As listed in Table 7, 404 training samples and 5463 testingsamples were available. Compared to the HYDICE image, roof class in this imagewas more spectrally homogeneous. However, the road class had within-class spectralvariation, particularly in the upper right subdivision.

    As shown in Figure 7, SKMd-BR still provided the best results for varied k, butthe performance of SKMd was closer to SKMd-BR in this experiment. Figure 8shows the performance variation with different similarity metrics. As before, OPDstill yielded the overall best results. Table 8 listed the accuracy when 5%, 10% and15% of bands were removed; in this case, 10% removal was the best.

    Figure 5. Classification maps in HYDICE (with six bands or six clusters).

    402 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • Table 3. The confusion matrix from using all the 191 bands in HYDICE experiment.

    Ground truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 861 0 69 0 0 32 962 89.50Grass 0 882 0 4 6 0 892 98.88Trail 1 0 498 0 0 2 501 99.40Tree 0 0 0 604 0 125 729 89.48Shadow 0 28 0 0 647 0 675 95.85Roof 30 0 0 15 3 964 1012 95.26

    No. groundtruth pixels

    892 910 567 624 656 1123 OA¼ 93.40

    Producersaccuracy (%)

    96.52 96.92 87.83 96.79 98.63 85.84 Kappa¼ 91.97

    Table 4. The confusion matrix from using six SKMd(BS) selected bands in HYDICEexperiment.

    Ground truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 842 0 54 0 0 37 933 90.25Grass 0 884 0 5 6 0 895 98.77Trail 2 0 513 0 0 0 515 99.61Tree 0 1 0 598 0 24 623 95.99Shadow 0 24 0 0 650 0 674 96.44Roof 48 1 0 20 0 1062 1131 93.90

    No. ground truthpixels

    892 910 567 624 656 1123 OA¼ 95.35

    Producers accuracy(%)

    94.39 97.14 90.48 95.83 99.09 94.57 Kappa¼ 94.32

    Table 5. The confusion matrix from using six SKMd selected clusters in HYDICEexperiment.

    Ground truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 874 0 47 0 0 41 962 90.85Grass 0 870 0 9 8 0 887 98.08Trail 2 0 520 0 1 0 523 99.43Tree 0 0 0 610 0 37 647 94.28Shadow 0 40 0 0 647 0 687 94.18Roof 16 0 0 4 0 1045 1065 98.11

    No. ground truthpixels

    892 910 567 624 656 1123 OA¼ 95.70

    Producers accuracy(%)

    97.98 95.60 91.71 97.76 98.63 93.05 Kappa¼ 94.76

    Geocarto International 403

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • Figure 9 presents classification maps when using six bands or clusters. Theimprovement in the vegetation area (highlighted in the circles in cyan) was obvious.In the roof areas circled in blue and magenta, SKMd(BS) could slightly reduce thegrey areas (for road) that were misclassified, but SKMd using cluster centres couldmore significantly reduce the grey (and black) areas. SKMd-BR further enlarged the

    Table 6. The confusion matrix from using 6 SKMd-BR selected clusters in HYDICEexperiment.

    Ground truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 872 0 44 0 0 41 957 91.12Grass 0 861 0 10 7 0 878 98.06Trail 2 0 523 0 3 0 528 99.05Tree 0 0 0 609 0 9 618 98.54Shadow 0 49 0 0 646 0 695 92.95Roof 18 0 0 4 0 1073 1095 98.00

    No. ground truthpixels

    892 910 567 624 656 1123 OA¼ 96.08

    Producers accuracy(%)

    97.76 94.62 92.24 97.60 98.48 95.55 Kappa¼ 95.21

    Figure 6. The image scene used in HyMap experiment (bands 14 (0.65 mm), 8 (0.55 mm), 2(0.45 mm)).

    404 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • orange areas but not signaficantly. Tables 9–12 are confusion matrices for the fourmethods, which can better demontrate the improvement in class separation fromSKMd-BR.

    3.3 Uncertainty analysis

    The non-parametric McNemar’s test was deployed to evaluate the statisticalsignificance in accuracy improvement with the proposed method (Foody 2004). It is

    Table 7. Training and testing samples used in HyMap experiment.

    Training Testing

    Road 73 1230Grass 72 1072Shadow 49 213Soil 69 371Tree 67 1321Roof 74 1236Total 404 5443

    Figure 7. Classification accuracy in HyMap experiment.

    Figure 8. Different similarity metrics used for SKMd-BR in HyMap experiment.

    Geocarto International 405

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • Table

    8.

    Classificationaccuracy

    vs.thepercentageofbandsremoved

    ineach

    cluster

    inHyMapexperim

    ent.

    56

    78

    910

    11

    12

    13

    14

    15

    SKMd

    0.8723

    0.8955

    0.9324

    0.9300

    0.9300

    0.9269

    0.9291

    0.9272

    0.9252

    0.9230

    0.9217

    SKMd-BR(5%)

    0.8826

    0.9008

    0.9320

    0.9295

    0.9344

    0.9346

    0.9295

    0.9302

    0.9285

    0.9256

    0.9256

    SKMd-BR(10%)

    0.8835

    0.9023

    0.9322

    0.9295

    0.9359

    0.9342

    0.9306

    0.9306

    0.9283

    0.9261

    0.9247

    SKMd-BR(15%)

    0.8821

    0.9043

    0.9302

    0.9300

    0.9364

    0.9335

    0.9302

    0.9295

    0.9261

    0.9260

    0.9249

    406 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • based on the standardized normal test statistic. For two methods to be compared, letf11 denote the number of samples that both methods can correctly classify, f22 thenumber of samples that both cannot, f12 the number of samples misclassified by

    Table 9. The confusion matrix from using all the 126 bands in HyMap experiment.

    Ground truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 954 0 0 1 0 83 1038 91.91Grass 5 1054 0 42 12 12 1125 93.69Trail 0 0 207 0 35 81 323 64.09Tree 8 6 0 328 0 1 343 95.63Shadow 0 10 3 0 1227 1 1241 98.87Roof 263 2 3 0 47 1058 1373 77.06

    No. ground truthpixels

    1230 1072 213 371 1321 1236 OA¼ 88.70

    Producersaccuracy (%)

    77.56 98.32 97.18 88.41 92.88 85.60 Kappa¼ 85.82

    Figure 9. Classification maps in HyMap experiment (with six bands or six clusters).

    Geocarto International 407

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • Table 10. The confusion matrix from using six SKMd(BS) selected bands in HyMapexperiment.

    Ground truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 1196 0 0 3 0 112 1311 91.23Grass 5 1055 0 36 27 13 1136 92.87Trail 0 0 206 0 93 40 339 60.77Tree 6 7 0 332 0 6 351 94.59Shadow 0 9 3 0 1201 0 1213 99.01Roof 23 1 4 0 0 1065 1093 97.44

    No. ground truthpixels

    1230 1072 213 371 1321 1236 OA¼ 92.87

    Producersaccuracy (%)

    97.24 98.41 96.71 89.49 90.92 86.17 Kappa¼ 91.07

    Table 11. The confusion matrix from using six SKMd selected clusters in HyMap experiment.

    Ground truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 1171 0 0 0 0 101 1272 92.06Grass 4 1055 1 37 16 13 1126 93.69Trail 0 1 209 0 92 23 325 64.31Tree 12 6 0 334 0 3 355 94.08Shadow 0 10 3 0 1213 1 1227 98.86Roof 43 0 0 0 0 1095 1138 96.22

    No. ground truthpixels

    1230 1072 213 371 1321 1236 OA¼ 93.28

    Producersaccuracy (%)

    95.20 98.41 98.12 90.03 91.82 88.59 Kappa¼ 91.57

    Table 12. The confusion matrix from using six SKMd-BR selected clusters in HyMapexperiment.

    Ground Truth No.classifiedpixels

    Usersaccuracy

    (%)Road Grass Trail Tree Shadow Roof

    Classified Road 1184 0 0 0 0 105 1289 91.85Grass 4 1058 2 38 19 14 1135 93.22Trail 0 1 209 0 80 25 315 64.35Tree 8 6 0 333 0 4 351 94.87Shadow 0 7 2 0 1222 0 1231 99.27Roof 34 0 0 0 0 1088 1122 96.97

    No. ground truthpixels

    1230 1072 213 371 1321 1236 OA¼ 93.59

    Producersaccuracy (%)

    96.26 98.69 98.12 89.76 92.51 88.03 Kappa¼ 91.96

    408 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • Table

    13.

    Zvalues

    intheMcN

    emar’stest

    forHYDIC

    Eexperim

    ent(the5%

    level

    ofsignificance

    isselected).

    SKMd-BR(10%)

    jzj

    56

    78

    910

    11

    12

    13

    14

    15

    mean

    SKMd

    4.2426

    2.6540

    4.0415

    6.5327

    3.7100

    6.3317

    5.2325

    3.6056

    5.6791

    3.5301

    1.9612

    4.3201

    SKMd(BS)

    6.0526

    7.7782

    7.1795

    7.4897

    2.0103

    1.2792

    1.3315

    5.8207

    8.1609

    4.6306

    0.1961

    4.7208

    Geocarto International 409

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • Table

    14.

    Zvalues

    intheMcN

    emar’stest

    forHyMapexperim

    ent(the5%

    level

    ofsignificance

    isselected).

    SKMd-BR(10%)

    jzj

    56

    78

    910

    11

    12

    13

    14

    15

    mean

    SKMd

    3.8874

    6.2152

    0.3375

    0.3922

    3.7417

    3.5794

    1.2792

    2.6458

    2.3094

    2.7854

    2.6458

    2.7108

    SKMd(BS)

    21.115

    24.259

    27.585

    25.815

    25.408

    25.406

    18.532

    3.987

    5.4000

    0.4336

    2.2223

    16.3783

    410 H. Su and Q. Du

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012

  • method 1 but not method 2, and f21 the number of samples misclassified by method 2but not method 1. Then the McNemar’s test statistic for these two methods can bedefined as:

    z ¼ f12 � f21ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffif12 þ f21p : ð3Þ

    For 5% level of significance, the corresponding jzj value is 1.96; a jzj value greaterthan this quantity means two methods have significant performance discrepancy.Tables 13 and 14 tabulate the jzj values when SKMd-BR was compared againstSKMd and SKMd(BS) with k being changed from 5 to 15. Obviously, theperformance of the proposed SKMd is statistically different from others most of thetime, and the discrepancy between SKMd-BR and SKM is less than that betweenSKMd-BR and SKM(BS).

    4. Conclusion

    The combination of band clustering and band selection is investigated forhyperspectral dimensionality reduction. Different from unsupervised clusteringusing all the pixels or supervised clustering requiring labelled pixels, our semi-supervised band clustering needs class spectral signatures only, so it is able tosignificantly reduce computational cost; after clustering, a cluster selection step canfurther improve the following data analysis performance. In this article, we haveshown that the spectral features represented by each cluster centroid can betterrepresent the corresponding cluster through outlier BR, thereby further enhancingthe overall classification accuracy in urban land cover mapping.

    References

    Al-Harbi, S.H. and Rayward-Smith, V.J., 2006. Adapting k-means for supervised clustering.Applied Intelligence, 24, 219–226.

    Burges, C.J.C., 1998. A tutorial on support vector machines for pattern recognition. DataMining and Knowledge Discovery, 2, 121–167.

    Chang, C.-I., 2000. An information-theoretic approach to spectral variability, similarity, anddiscrimination for hyperspectral image analysis. IEEE Transactions on InformationTheory, 46 (5), 1927–1932.

    Chang, C.-I., 2003. Hyperspectral imaging: techniques for spectral detection and classification.New York: Kluwer Academic/Plenum.

    Du, Q. and Yang, H., 2008. Similarity-based unsupervised band selection for hyperspectralimage analysis. IEEE Geoscience and Remote Sensing Letters, 5 (4), 564–568.

    Foody, G.M., 2004. Thematic map comparison: evaluating the statistical significance ofdifferences in classification accuracy. Photogrammetric Engineering & Remote Sensing, 70(5), 627–633.

    Harsanyi, J.C. and Chang, C.-I., 1994. Hyperspectral image classification and dimensionalityreduction: an orthogonal subspace projection approach. IEEE Transactions on Geoscienceand Remote Sensing, 32 (4), 779–785.

    Martı́nez-Usó, A., et al., 2007. Clustering-based hyperspectral band selection using informa-tion measures. IEEE Transactions on Geoscience and Remote Sensing, 45 (12), 4158–4171.

    Mojaradi, B., et al., 2008. A novel band selection method for hyperspectral data analysis.International Archives of the Photogrammetry, Remote Sensing and Spatial InformationSciences, XXXVII, 447–451.

    Su, H., et al., 2011. Semi-supervised band clustering for dimensionality reduction of hyper-spectral imagery. IEEE Geoscience and Remote Sensing Letters, 8 (6), 1135–1139.

    Geocarto International 411

    Dow

    nloa

    ded

    by [

    Qia

    n D

    u] a

    t 00:

    44 2

    9 Ju

    ly 2

    012