Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [Qian Du]On: 29 July 2012, At: 00:44Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Geocarto InternationalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tgei20
Hyperspectral band clustering andband selection for urban land coverclassificationHongjun Su a & Qian Du ba School of Earth Sciences and Engineering, Hohai University,Nanjing, Chinab Department of Electrical and Computer Engineering, MississippiState University, Starkville, USA
Accepted author version posted online: 29 Nov 2011. Version ofrecord first published: 12 Jan 2012
To cite this article: Hongjun Su & Qian Du (2012): Hyperspectral band clustering and band selectionfor urban land cover classification, Geocarto International, 27:5, 395-411
To link to this article: http://dx.doi.org/10.1080/10106049.2011.643322
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.
http://www.tandfonline.com/loi/tgei20http://dx.doi.org/10.1080/10106049.2011.643322http://www.tandfonline.com/page/terms-and-conditionshttp://www.tandfonline.com/page/terms-and-conditions
Hyperspectral band clustering and band selection for urban land coverclassification
Hongjun Sua and Qian Dub*
aSchool of Earth Sciences and Engineering, Hohai University, Nanjing, China; bDepartment ofElectrical and Computer Engineering, Mississippi State University, Starkville, USA
(Received 1 August 2011; final version received 18 November 2011)
The aim of this study is to combine band clustering with band selection fordimensionality reduction of hyperspectral imagery. The performance ofdimensionality reduction is evaluated through urban land cover classificationaccuracy with the dimensionality-reduced data. Different from unsupervisedclustering using all the pixels or supervised clustering requiring labelled pixels, thediscussed semi-supervised band clustering needs class spectral signatures only;band selection result is used as initial condition for band clustering; afterclustering, a cluster selection step is applied to select clusters to be used in thefollowing data analysis. In this article, we propose to conduct band selection byremoving outlier bands in each cluster before finalizing cluster centres. Theexperimental results in urban land cover classification show that the proposedalgorithm can further enhance support vector machine (SVM)-based classificationaccuracy.
Keywords: hyperspectral imagery; dimensionality reduction; band clustering;band selection; urban land cover classification
1. Introduction
A hyperspectral imaging sensor collects hundreds of spectral bands with very finespectral resolution for the same area on the earth. Its abundant spectral informationprovides the potential of accurate object classification and identification. However,its vast data volume brings about problems in data transmission and storage. Inparticular, the very high data dimensionality presents a challenge to many traditionalimage analysis algorithms. Dimensionality reduction of hyperspectral imagery isoften achieved by band selection, whose objective is to find a small subset of bandscontaining important data information. Another approach is band grouping or bandclustering. For instance, adjacent bands can be grouped together and arepresentative of each group can be selected to participate in the following dataanalysis. Intuitively, adjacent bands can be partitioned uniformly or based onspectral correlation coefficient. Figure 1 shows a 126 6 126 spectral correlationcoefficient matrix, where a bright pixel at location (i, j) means high correlationbetween the i-th and j-th bands; if the pixel is in dark, then the correlation is low. Thewhite blocks along the diagonal line indicate that adjacent bands usually have high
*Corresponding author. Email: [email protected]
Geocarto International
Vol. 27, No. 5, August 2012, 395–411
ISSN 1010-6049 print/ISSN 1752-0762 online
� 2012 Taylor & Francishttp://dx.doi.org/10.1080/10106049.2011.643322
http://www.tandfonline.com
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
correlation and should be grouped together. However, non-adjacent bands may alsohave high correlation in Figure 1, as indicated by the presence of white blocks in off-diagonal areas. Thus, non-adjacent bands should be allowed to be grouped together.While most research in the literature copes with band selection, our approach usingband clusters can provide better results.
The typical implementation of clustering is to cluster pixels based on theirspectral signatures so as to spatially segment an image scene into many sub-regions.Band clustering is another type of implementation in the spatial domain; in otherwords, a spectral band is converted into a vector after column- or row-stacking, thenthese band vectors are clustered into several groups based on their similarity. InMartı́nez-Usó et al. (2007), two clustering methods, i.e. Ward’s linkage strategyusing mutual information (WaLuMI) and Ward’s linkage strategy using divergence(WaLuDi), were developed, and finalized clusters were used for band selection.
In our research, we will focus on k-means-based band clustering fordimensionality reduction (Mojaradi et al. 2008). One of its drawbacks is that it issensitive to initial condition and may be trapped in local optima; different initialconditions may produce different clusters. We have proposed a new initial techniqueusing band selection output (Su et al. 2011). Because of unsupervised nature, k-means clustering may be time-consuming when using all the pixels. In Al-Harbi andRayward-Smith (2006), k-means was extended to a supervised version, wheretraining samples for each class were required for clustering. However, in practice, itmay be difficult to obtain enough training samples; instead, it may be possible tohave a spectral signature for each class.
Therefore, we proposed a semi-supervised k-means clustering method that usesclass signatures only (a class signature is the representative spectrum of a class) in Suet al. (2011). Instead of using the band closest to the cluster centre, we use bandcluster centres for the following data analysis (e.g. detection and classification). Wehave shown that using cluster centres is better than using selected bands. We alsoconducted cluster selection, and showed that deleting the worst cluster providedbetter performance. Initial condition is critical to the clustering performance and ourband selection result can be used as initials (Du and Yang 2008).
In this article, we propose to conduct band selection by removing outlier bands ineach cluster before finalizing cluster centres. The resulting cluster centres can better
Figure 1. Spectral correlation coefficient matrix for a 126-band HyMap image.
396 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
represent the key spectral features in corresponding clusters, thereby furtherenhancing classification performance. Outlier bands are determined by a similaritymetric, such as Euclidean distance (EUD), Mahalanobis distance (MD), spectralangle mapper (SAM), or spectral information divergence (SID) (Chang 2000). Here,we adopt orthogonal projection divergence (OPD) (Chang 2003), which shows thebest performance for the hyperspectral digital imagery collection experiment(HYDICE) and HyMap urban images in our experiment.
2. Methodology
2.1 Band clustering
Given a set of bands (B1, . . . , Bl, . . . , BL), where each band is arranged into N-dimensional vector where N is the number of pixels. k-means band clustering aims topartition the L bands into k clusters C¼ {C1, . . . , Cm, . . . , Ck} (1�m� k) so as tominimize the following objective function:
argminC
Xkm¼1
XBl2Cm
D Bl; mmð Þ ð1Þ
where lm is the cluster centre of Cm and D(. , .) is a distance metric gauging thesimilarity between a band and the centre of the cluster it is assigned to. Itscomputational complexity is linearly proportional to the number of pixels N. Inorder to reduce the complexity, we use class signatures as algorithm input, then thecomplexity becomes linearly proportional to the number of signatures S (S�N).This approach is denoted as semi-supervised k-means (SKM). When only the bandclosest to the cluster centre is used in the following analysis, the resulting method isdenoted as SKM(BS).
The SKM algorithm is initialized by using distinctive bands as cluster centroids.The idea of unsupervisedly selecting distinctive bands was presented (Du and Yang2008). The band selection algorithm is initialized by choosing a pair of bands B1 andB2, leading to a band subset F¼ {B1, B2}; it then finds a third band B3 that is themost dissimilar to all the bands in the current F by using a certain criterion, resultingin an updated subset F¼ {F [ B3}; the selection step is repeated until the number ofbands in F is large enough. Here, linear prediction (LP) error (i.e. the differencebetween an original band and its linear predicted version using bands in F) isemployed as the similarity metric. A band with the maximum LP error is the mostdissimilar band from those in F and should be selected.
After k-means clustering, k clusters with their centroids are ready for furtheranalysis. However, it does not mean that all of them should be used. Some clustersmay not be helpful for object classification, and they may even bring aboutconfusion. Thus, we propose to remove a cluster by exhaustively searching for theworst one (when it is removed, the remaining clusters provided the most similarclassification maps to those from using all the original bands). It is observed thatdeleting one cluster usually results in improvement, but deleting more than onecluster may not necessarily provide further improvement. Thus, only one cluster isremoved hereafter. The SKM algorithm deleting the worst cluster is denoted asSKMd.
Geocarto International 397
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
2.2 Outlier band removal (BR)Within a band cluster, the contribution from each band to class separation isdifferent. For instance, bands far away from the cluster centre may be considered asoutlier or an anomalous band, whose spectral features are quite different from otherbands in the same cluster. Based on our experience, such outlier bands shoud beremoved and the cluster centre should be recaluated with the remaining bands. Usingthe final cluster centres can improve the performance. The resulting algorithm isdenoted as SKMd-BR.
A band may be deleted with a similarity metric. In this article, we adopt OPD(Chang 2003), which is based on the concept of orthogonal subspace projection(Harsanyi and Chang 1994). Let cj denote the j-th cluster centroid and bij the i-thband in j-th cluster. Their OPD value is defined as
OPD bij; cj� �
¼ bTijP?cjbij þ cTj P?bijcj
� �1=2ð2Þ
where P?cm ¼ I� cmðcTmcmÞcTm for m¼ ij, j, and I is an identity matrix P?cj is the
orthogonal subspace of cj and bTijP?cjbij is the squared norm of the projection of bij onto
P?cj . Similarly, cTj P?bijcj is the squared norm of the projection of cj onto P
?bij. A larger
OPD value means bij and cj are more different, which means bij may be an outlier.
2.3 The proposed algorithm
The proposed SKMd-BR algorithm can be detailed as below.
(1) Initialize the algorithm by using k selected distinctive bands.(2) With the known class signatures, conduct k-means band clustering. The
clustering is completed when no band is shuffled from one cluster to another.Compute band cluster centriods by averaging all the bands clustered.
(3) OPD is employed to compute pair-wise cluster similarity. The cluster with thelargest average OPD will be removed. The resulting k71 clusters are the finalband clustering result.
(4) Calculate the OPD value between each band and its cluster centriod. A certainpercentage of bands with large OPD values are removed. The k71 clustercentriods are updated with the remaining bands, which are the final outputs.
3. Experiments
Two real-data experiments were conducted. Clustering quality was evaluated withclassification accuracy. When training and test samples are available, support vectormachine (SVM) can be applied (Burges 1998). The libSVM library was used (http://www.csie.ntu.edu.tw/cjlin/libsvm/) for this research. The proposed SKMd-BR wascompared against SKMd and SKM(BS) since SKMd could outperform other bandclustering methods in our previous work (Su et al. 2011).
3.1 Hyperspectral digital imagery collection experiment
Data collected by the airborne HYDICE sensor were used, which covers 0.4–2.5 mmspectral coverage with 210 bands and 10 nm spectral resolution. The subimage scene
398 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
http://www.csie.ntu.edu.tw/cjlin/libsvm/http://www.csie.ntu.edu.tw/cjlin/libsvm/
with 304 6 301 pixels over the Washington DC Mall area with about 2.8 m spatialresolution was shown in Figure 2. After bad band removal (BR), 191 bands wereused. Six classes are present in this image scene: roof, tree, grass, water, road andtrail. These six class centres were used for band clustering. The overall accuracy (OA)from SVM was computed with training and test samples listed in Table 1.
As shown in Figure 3, SKMd-BR provided the best results when k was changedfrom 5 to 15, where SKMd-BR(10%) indicates 10% of bands were removed fromeach cluster. Figure 4 shows the performance variation when different similaritymetrics were adopted for SKMd-BR. Obviously, OPD yielded the best results.Table 2 listed the accuracy when 5%, 10% and 15% of bands were removed, whereno conclusion could be drawn about which percentage was the best. However, theoverall performance discrepancy was not critical.
Figure 5 presents classification maps when using six bands or clusters. There weresignificant amount of misclassificaiton between trail (in yellow) and roof (in orange).
Figure 2. The image scene used in HYDICE (bands 47 (0.63 mm), 35 (0.55 mm), 15(0.45 mm)).
Geocarto International 399
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
With six selected bands, SKMd(BS) could slightly reduce the yellow (trail) areas thatwere supposed to be in orange as roof (as highlighted in two circles), but SKMdusing cluster centres could significantly reduce the yellow (trail) areas. SKMd-BRfurther enlarged the orange (roof) areas but not signaficantly. Tables 3–6 areconfusion matrices for the four methods, which more clearly showed theimprovement in class separation, particularly from SKMd-BR.
Table 1. Training and testing samples used in HYDICE experiment.
Training Testing
Road 55 892Grass 57 910Trail 50 567Tree 46 624Shadow 49 656Roof 52 1123Total 309 4772
Figure 3. Classification accuracy in HYDICE (SKMd-BR with OPD).
Figure 4. Different similarity metrics used for SKMd-BR in HYDICE.
400 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
Table
2.
Classificationaccuracy
vs.thepercentageofbandsremoved
ineach
cluster
inHYDIC
Eexperim
ent.
56
78
910
11
12
13
14
15
SKMd
0.9530
0.9570
0.9468
0.9422
0.9377
0.9392
0.9434
0.9449
0.9432
0.9466
0.9466
SKMd-BR(5%)
0.9560
0.9612
0.9585
0.9528
0.9486
0.9539
0.9568
0.9516
0.9577
0.9541
0.9442
SKMd-BR(10%)
0.9579
0.9608
0.9575
0.9533
0.9476
0.9541
0.9568
0.9528
0.9579
0.9537
0.9436
SKMd-BR(15%)
0.9600
0.9598
0.9570
0.9537
0.9480
0.9547
0.9583
0.9539
0.9577
0.9530
0.9440
Geocarto International 401
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
3.2 HyMap experiment
Figure 6 shows a 126-band airborne HyMap (with 0.45–2.48 mm spectral coverageand about 16 nm spectral resolution) data that were acquired from a residential areanear the campus of Purdue University in 1999. The image size is 377 6 512. Thespatial resolution is about 5 m. The image scene includes six classes: {road, grass,shadow, soil, tree, roof}. As listed in Table 7, 404 training samples and 5463 testingsamples were available. Compared to the HYDICE image, roof class in this imagewas more spectrally homogeneous. However, the road class had within-class spectralvariation, particularly in the upper right subdivision.
As shown in Figure 7, SKMd-BR still provided the best results for varied k, butthe performance of SKMd was closer to SKMd-BR in this experiment. Figure 8shows the performance variation with different similarity metrics. As before, OPDstill yielded the overall best results. Table 8 listed the accuracy when 5%, 10% and15% of bands were removed; in this case, 10% removal was the best.
Figure 5. Classification maps in HYDICE (with six bands or six clusters).
402 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
Table 3. The confusion matrix from using all the 191 bands in HYDICE experiment.
Ground truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 861 0 69 0 0 32 962 89.50Grass 0 882 0 4 6 0 892 98.88Trail 1 0 498 0 0 2 501 99.40Tree 0 0 0 604 0 125 729 89.48Shadow 0 28 0 0 647 0 675 95.85Roof 30 0 0 15 3 964 1012 95.26
No. groundtruth pixels
892 910 567 624 656 1123 OA¼ 93.40
Producersaccuracy (%)
96.52 96.92 87.83 96.79 98.63 85.84 Kappa¼ 91.97
Table 4. The confusion matrix from using six SKMd(BS) selected bands in HYDICEexperiment.
Ground truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 842 0 54 0 0 37 933 90.25Grass 0 884 0 5 6 0 895 98.77Trail 2 0 513 0 0 0 515 99.61Tree 0 1 0 598 0 24 623 95.99Shadow 0 24 0 0 650 0 674 96.44Roof 48 1 0 20 0 1062 1131 93.90
No. ground truthpixels
892 910 567 624 656 1123 OA¼ 95.35
Producers accuracy(%)
94.39 97.14 90.48 95.83 99.09 94.57 Kappa¼ 94.32
Table 5. The confusion matrix from using six SKMd selected clusters in HYDICEexperiment.
Ground truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 874 0 47 0 0 41 962 90.85Grass 0 870 0 9 8 0 887 98.08Trail 2 0 520 0 1 0 523 99.43Tree 0 0 0 610 0 37 647 94.28Shadow 0 40 0 0 647 0 687 94.18Roof 16 0 0 4 0 1045 1065 98.11
No. ground truthpixels
892 910 567 624 656 1123 OA¼ 95.70
Producers accuracy(%)
97.98 95.60 91.71 97.76 98.63 93.05 Kappa¼ 94.76
Geocarto International 403
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
Figure 9 presents classification maps when using six bands or clusters. Theimprovement in the vegetation area (highlighted in the circles in cyan) was obvious.In the roof areas circled in blue and magenta, SKMd(BS) could slightly reduce thegrey areas (for road) that were misclassified, but SKMd using cluster centres couldmore significantly reduce the grey (and black) areas. SKMd-BR further enlarged the
Table 6. The confusion matrix from using 6 SKMd-BR selected clusters in HYDICEexperiment.
Ground truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 872 0 44 0 0 41 957 91.12Grass 0 861 0 10 7 0 878 98.06Trail 2 0 523 0 3 0 528 99.05Tree 0 0 0 609 0 9 618 98.54Shadow 0 49 0 0 646 0 695 92.95Roof 18 0 0 4 0 1073 1095 98.00
No. ground truthpixels
892 910 567 624 656 1123 OA¼ 96.08
Producers accuracy(%)
97.76 94.62 92.24 97.60 98.48 95.55 Kappa¼ 95.21
Figure 6. The image scene used in HyMap experiment (bands 14 (0.65 mm), 8 (0.55 mm), 2(0.45 mm)).
404 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
orange areas but not signaficantly. Tables 9–12 are confusion matrices for the fourmethods, which can better demontrate the improvement in class separation fromSKMd-BR.
3.3 Uncertainty analysis
The non-parametric McNemar’s test was deployed to evaluate the statisticalsignificance in accuracy improvement with the proposed method (Foody 2004). It is
Table 7. Training and testing samples used in HyMap experiment.
Training Testing
Road 73 1230Grass 72 1072Shadow 49 213Soil 69 371Tree 67 1321Roof 74 1236Total 404 5443
Figure 7. Classification accuracy in HyMap experiment.
Figure 8. Different similarity metrics used for SKMd-BR in HyMap experiment.
Geocarto International 405
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
Table
8.
Classificationaccuracy
vs.thepercentageofbandsremoved
ineach
cluster
inHyMapexperim
ent.
56
78
910
11
12
13
14
15
SKMd
0.8723
0.8955
0.9324
0.9300
0.9300
0.9269
0.9291
0.9272
0.9252
0.9230
0.9217
SKMd-BR(5%)
0.8826
0.9008
0.9320
0.9295
0.9344
0.9346
0.9295
0.9302
0.9285
0.9256
0.9256
SKMd-BR(10%)
0.8835
0.9023
0.9322
0.9295
0.9359
0.9342
0.9306
0.9306
0.9283
0.9261
0.9247
SKMd-BR(15%)
0.8821
0.9043
0.9302
0.9300
0.9364
0.9335
0.9302
0.9295
0.9261
0.9260
0.9249
406 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
based on the standardized normal test statistic. For two methods to be compared, letf11 denote the number of samples that both methods can correctly classify, f22 thenumber of samples that both cannot, f12 the number of samples misclassified by
Table 9. The confusion matrix from using all the 126 bands in HyMap experiment.
Ground truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 954 0 0 1 0 83 1038 91.91Grass 5 1054 0 42 12 12 1125 93.69Trail 0 0 207 0 35 81 323 64.09Tree 8 6 0 328 0 1 343 95.63Shadow 0 10 3 0 1227 1 1241 98.87Roof 263 2 3 0 47 1058 1373 77.06
No. ground truthpixels
1230 1072 213 371 1321 1236 OA¼ 88.70
Producersaccuracy (%)
77.56 98.32 97.18 88.41 92.88 85.60 Kappa¼ 85.82
Figure 9. Classification maps in HyMap experiment (with six bands or six clusters).
Geocarto International 407
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
Table 10. The confusion matrix from using six SKMd(BS) selected bands in HyMapexperiment.
Ground truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 1196 0 0 3 0 112 1311 91.23Grass 5 1055 0 36 27 13 1136 92.87Trail 0 0 206 0 93 40 339 60.77Tree 6 7 0 332 0 6 351 94.59Shadow 0 9 3 0 1201 0 1213 99.01Roof 23 1 4 0 0 1065 1093 97.44
No. ground truthpixels
1230 1072 213 371 1321 1236 OA¼ 92.87
Producersaccuracy (%)
97.24 98.41 96.71 89.49 90.92 86.17 Kappa¼ 91.07
Table 11. The confusion matrix from using six SKMd selected clusters in HyMap experiment.
Ground truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 1171 0 0 0 0 101 1272 92.06Grass 4 1055 1 37 16 13 1126 93.69Trail 0 1 209 0 92 23 325 64.31Tree 12 6 0 334 0 3 355 94.08Shadow 0 10 3 0 1213 1 1227 98.86Roof 43 0 0 0 0 1095 1138 96.22
No. ground truthpixels
1230 1072 213 371 1321 1236 OA¼ 93.28
Producersaccuracy (%)
95.20 98.41 98.12 90.03 91.82 88.59 Kappa¼ 91.57
Table 12. The confusion matrix from using six SKMd-BR selected clusters in HyMapexperiment.
Ground Truth No.classifiedpixels
Usersaccuracy
(%)Road Grass Trail Tree Shadow Roof
Classified Road 1184 0 0 0 0 105 1289 91.85Grass 4 1058 2 38 19 14 1135 93.22Trail 0 1 209 0 80 25 315 64.35Tree 8 6 0 333 0 4 351 94.87Shadow 0 7 2 0 1222 0 1231 99.27Roof 34 0 0 0 0 1088 1122 96.97
No. ground truthpixels
1230 1072 213 371 1321 1236 OA¼ 93.59
Producersaccuracy (%)
96.26 98.69 98.12 89.76 92.51 88.03 Kappa¼ 91.96
408 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
Table
13.
Zvalues
intheMcN
emar’stest
forHYDIC
Eexperim
ent(the5%
level
ofsignificance
isselected).
SKMd-BR(10%)
jzj
56
78
910
11
12
13
14
15
mean
SKMd
4.2426
2.6540
4.0415
6.5327
3.7100
6.3317
5.2325
3.6056
5.6791
3.5301
1.9612
4.3201
SKMd(BS)
6.0526
7.7782
7.1795
7.4897
2.0103
1.2792
1.3315
5.8207
8.1609
4.6306
0.1961
4.7208
Geocarto International 409
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
Table
14.
Zvalues
intheMcN
emar’stest
forHyMapexperim
ent(the5%
level
ofsignificance
isselected).
SKMd-BR(10%)
jzj
56
78
910
11
12
13
14
15
mean
SKMd
3.8874
6.2152
0.3375
0.3922
3.7417
3.5794
1.2792
2.6458
2.3094
2.7854
2.6458
2.7108
SKMd(BS)
21.115
24.259
27.585
25.815
25.408
25.406
18.532
3.987
5.4000
0.4336
2.2223
16.3783
410 H. Su and Q. Du
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012
method 1 but not method 2, and f21 the number of samples misclassified by method 2but not method 1. Then the McNemar’s test statistic for these two methods can bedefined as:
z ¼ f12 � f21ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffif12 þ f21p : ð3Þ
For 5% level of significance, the corresponding jzj value is 1.96; a jzj value greaterthan this quantity means two methods have significant performance discrepancy.Tables 13 and 14 tabulate the jzj values when SKMd-BR was compared againstSKMd and SKMd(BS) with k being changed from 5 to 15. Obviously, theperformance of the proposed SKMd is statistically different from others most of thetime, and the discrepancy between SKMd-BR and SKM is less than that betweenSKMd-BR and SKM(BS).
4. Conclusion
The combination of band clustering and band selection is investigated forhyperspectral dimensionality reduction. Different from unsupervised clusteringusing all the pixels or supervised clustering requiring labelled pixels, our semi-supervised band clustering needs class spectral signatures only, so it is able tosignificantly reduce computational cost; after clustering, a cluster selection step canfurther improve the following data analysis performance. In this article, we haveshown that the spectral features represented by each cluster centroid can betterrepresent the corresponding cluster through outlier BR, thereby further enhancingthe overall classification accuracy in urban land cover mapping.
References
Al-Harbi, S.H. and Rayward-Smith, V.J., 2006. Adapting k-means for supervised clustering.Applied Intelligence, 24, 219–226.
Burges, C.J.C., 1998. A tutorial on support vector machines for pattern recognition. DataMining and Knowledge Discovery, 2, 121–167.
Chang, C.-I., 2000. An information-theoretic approach to spectral variability, similarity, anddiscrimination for hyperspectral image analysis. IEEE Transactions on InformationTheory, 46 (5), 1927–1932.
Chang, C.-I., 2003. Hyperspectral imaging: techniques for spectral detection and classification.New York: Kluwer Academic/Plenum.
Du, Q. and Yang, H., 2008. Similarity-based unsupervised band selection for hyperspectralimage analysis. IEEE Geoscience and Remote Sensing Letters, 5 (4), 564–568.
Foody, G.M., 2004. Thematic map comparison: evaluating the statistical significance ofdifferences in classification accuracy. Photogrammetric Engineering & Remote Sensing, 70(5), 627–633.
Harsanyi, J.C. and Chang, C.-I., 1994. Hyperspectral image classification and dimensionalityreduction: an orthogonal subspace projection approach. IEEE Transactions on Geoscienceand Remote Sensing, 32 (4), 779–785.
Martı́nez-Usó, A., et al., 2007. Clustering-based hyperspectral band selection using informa-tion measures. IEEE Transactions on Geoscience and Remote Sensing, 45 (12), 4158–4171.
Mojaradi, B., et al., 2008. A novel band selection method for hyperspectral data analysis.International Archives of the Photogrammetry, Remote Sensing and Spatial InformationSciences, XXXVII, 447–451.
Su, H., et al., 2011. Semi-supervised band clustering for dimensionality reduction of hyper-spectral imagery. IEEE Geoscience and Remote Sensing Letters, 8 (6), 1135–1139.
Geocarto International 411
Dow
nloa
ded
by [
Qia
n D
u] a
t 00:
44 2
9 Ju
ly 2
012