11
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=trsl20 Remote Sensing Letters ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/trsl20 A mixture generative adversarial network with category multi-classifier for hyperspectral image classification Hengchao Li , Weiye Wang , Shaohui Ye , Yangjun Deng , Fan Zhang & Qian Du To cite this article: Hengchao Li , Weiye Wang , Shaohui Ye , Yangjun Deng , Fan Zhang & Qian Du (2020) A mixture generative adversarial network with category multi-classifier for hyperspectral image classification, Remote Sensing Letters, 11:11, 983-992, DOI: 10.1080/2150704X.2020.1804641 To link to this article: https://doi.org/10.1080/2150704X.2020.1804641 Published online: 22 Sep 2020. Submit your article to this journal Article views: 90 View related articles View Crossmark data

A mixture generative adversarial network with ... - ECE at MSU

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A mixture generative adversarial network with ... - ECE at MSU

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=trsl20

Remote Sensing Letters

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/trsl20

A mixture generative adversarial network withcategory multi-classifier for hyperspectral imageclassification

Hengchao Li , Weiye Wang , Shaohui Ye , Yangjun Deng , Fan Zhang & QianDu

To cite this article: Hengchao Li , Weiye Wang , Shaohui Ye , Yangjun Deng , Fan Zhang& Qian Du (2020) A mixture generative adversarial network with category multi-classifierfor hyperspectral image classification, Remote Sensing Letters, 11:11, 983-992, DOI:10.1080/2150704X.2020.1804641

To link to this article: https://doi.org/10.1080/2150704X.2020.1804641

Published online: 22 Sep 2020.

Submit your article to this journal

Article views: 90

View related articles

View Crossmark data

Page 2: A mixture generative adversarial network with ... - ECE at MSU

A mixture generative adversarial network with category multi-classifier for hyperspectral image classificationHengchao Lia, Weiye Wanga, Shaohui Yea, Yangjun Denga, Fan Zhang b and Qian Duc

aSchool of Information Science and Technology, Southwest Jiaotong University, Chengdu, China; bCollege of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China; cDepartment of Electrical and Computer Engineering, Mississippi State University, Starkville, USA

ABSTRACTHyperspectral image (HSI) classification is one of the core techni-ques in HSI processing. In order to solve the problem of scarcity of labelled samples, a novel HSI classification framework based on mixture generative adversarial networks (MGAN) is proposed in this letter. Firstly, to overcome the drawback that MGAN cannot be directly applied for classification, a category multi-classifier is introduced into MGAN to conduct the classification task. Due to 3D convolutional neural network (3DCNN) is adopted as the category multi-classifier, the spatial information and local 3D data structure of HSI can be captured for classification, and the proposed frame-work is named as MGAN-3DCNN. Accordingly, a new loss function is constructed. Secondly, since the new loss function is a tripartite game which is difficult to achieve Nash equilibrium, a step-by-step training strategy is designed to solve the related minimax problem. Experiments on two HSI data sets demonstrate that the proposed MGAN-3DCNN greatly alleviates the over-fitting problem and improves the robustness of HSI classification in small-size samples.

ARTICLE HISTORY Received 11 March 2020 Accepted 28 July 2020

1. Introduction

Hyperspectral remote sensing is a typical earth observation technology, which can collect images with higher spectral resolution. Hyperspectral image (HSI) has a typical three- dimensional structure, in which, two spatial dimensions reflect the distribution informa-tion of objects in the scene (Deng et al. 2020), and the spectral dimension contains the band information of objects in the pixel. The HSI classification, as one of the core techniques in HSI processing (Li et al. 2018; Wang et al. 2020; Zhao and Du 2016), has been widely used in geological exploration, water resource management, military recon-naissance, etc.

In past decades, a variety of machine learning methods have been applied to HSI classification, such as support vector machine (SVM) (Melgani and Bruzzone 2004), k- nearest neighbour (k-NN) (Cover and Hart 1967), random forest (Ham et al. 2005), etc. Recently, due to deep learning has the characteristics of automatic feature learning and hierarchical information representation (LeCun et al. 1998; Sun et al. 2016; Zhao et al.

CONTACT Yangjun Deng. [email protected] School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China

REMOTE SENSING LETTERS 2020, VOL. 11, NO. 11, 983–992 https://doi.org/10.1080/2150704X.2020.1804641

© 2020 Informa UK Limited, trading as Taylor & Francis Group

Page 3: A mixture generative adversarial network with ... - ECE at MSU

2020), many HSI classification methods based on deep learning have been proposed. Specifically, Li et al. proposed an HSI classification model based on deep belief network (DBN) (Li, Zhang, and Zhang 2014), which achieved good performance in spatial-spectral classification. Chen et al. extracted the HSI features based on stacked auto-encoder (SAE) and classified them by the logistic regression classifier (Chen et al. 2014). The classification method proposed in (Yang et al. 2016) first reduced the dimensionality of HSIs and then extracted spatial features of HSI by two-channel deep convolutional neural network (2DCNN). 3D convolutional neural network (3DCNN) utilized in (Li, Zhang, and Shen 2017) can simultaneously extract the spatial and spectral informations of hyperspectral data without dimensionality reduction. In (Zhu et al. 2018), Zhu et al. firstly employed generative adversarial network (GAN) (Goodfellow et al. 2014) to classify HSIs and gen-erate synthetic samples. In particular, the discriminator trained by synthetic and real samples can obtain more abstract features for decision-making, which will greatly improve the final classification performance. By using the generated, unlabelled, and labelled samples to train the discriminator simultaneously, Xue proposed a semi- supervised deep convolutional GAN (SS-DCGAN) (Xue 2020) to further improve the classification performance of HSI.

As well-known, labelling HSI samples are costly and time-consuming. Furthermore, the labelling process generally requires professional guidance. As such, the labelled HSI samples are very limited, which poses great challenges to the construction of supervised classification models and parameter estimation. As previously mentioned, the GAN can generate samples to address the problem of insufficient data samples. However, GAN often suffers from the problem of mode collapse and the difficulty of overall optimization. Mixture generative adversarial network (MGAN) (Hoang et al. 2018) is a good solution to the above problem, which is a new approach to train the GANs by employing multiple generators, instead of using a single one. The discriminator determines whether the samples are true data or generated by generators, and the classifier ascertains which generator a sample comes from. Moreover, the parameter sharing scheme greatly reduced the number of MGAN model’s parameters.

In this letter, a novel HSI classification framework based on MGAN named MGAN- 3DCNN is proposed. Firstly, to directly extend MGAN for the classification purpose, a category multi-classifier is integrated into MGAN to conduct the classification task and a new loss function is formulated. Secondly, a step-by-step training strategy is designed to solve the new loss function, which can achieve the convergence of loss function effec-tively. Experiments demonstrate that the proposed algorithm can efficiently prevent the over-fitting problem and significantly improve the robustness of HSI classification in small-size samples.

2. Methodology

2.1. Generative adversarial network

The core concept of GAN (Goodfellow et al. 2014) comes from the Nash equilibrium of game theory. The GAN consists of a generator and a discriminator. The goal of the generator is to learn the potential distribution of real data and generate samples that conform to the distribution. The discriminator is trained alternately with real and

984 H. LI ET AL.

Page 4: A mixture generative adversarial network with ... - ECE at MSU

generated samples. The training of discriminator is to improve its ability of identifying real and generated samples. Each iteration is a game between generator and discriminator. In order to obtain the generator and discriminator with excellent performance, the model needs to be continuously optimized. The optimization goal is to find the Nash equilibrium between the generator and the discriminator. The objective function of GAN is a minimal maximization problem, which can be presented as

minG

maxD

VðD;GÞ ¼ Ex,PdataðxÞ½ln DðxÞ� þ Ez,PzðzÞ½lnð1 � DðGðzÞÞÞ� (1)

where Vð�Þ denotes the objective function, E is the expectation operator, x is the sample in the data distribution Pdata, z is the sample in the prior distribution Pz, and the mapping relationship GðzÞ induces a generator’s distribution Pmodel in the data space. GAN alter-nately optimizes generator G and discriminator D by the learning method based on random gradient. G attempts to map each z to a corresponding x to maximize its probability of being true data, which is the main reason causing the mode collapse problem. At the optimum point of D, minimizing G is equivalent to minimizing the Jensen- Shannon (JS) divergence of the data distribution and model distribution, resulting in little difference between the samples generated by generator.

2.2. Proposed MGAN-3DCNN

Different from GAN, MGAN (Hoang et al. 2018) replaces the single distribution in GAN with a mixed multi-distribution model to approximate the true data distribution, which increases the divergence between the generators so that the mixtured distribution can cover different data patterns. Moreover, the model parameters are greatly reduced in MGAN by adopting weight sharing strategy. At present, MGAN has been widely applied to address the small-sample problem. However, MGAN is only applied for sample augmen-tation in the small-sample problem, which cannot be directly used for classification.

To take full advantage of MGAN in sample generation, this letter attempts to construct a novel classification framework-based MGAN. Accordingly, a category multi-classifier is integrated into MGAN for achieving classification. By using 3DCNN as the category multi- classifier, a new MGAN-3DCNN network model is constructed as shown in Figure 1. It consists of K generators (that induce a mixture over K distributions), a true/false discrimi-nator (which identifies whether the sample is real or fake), a source multi-classifier (which can classify generated samples and indicate the generated samples came from which generator), and a category multi-classifier (for distinguishing the sample’s categories). GuðzÞ denotes the output of K generators and the index u is drawn from a multinomial distribution multðπÞ where π ¼ ½π1; π2; � � � ; πK� is the coefficient of the mixture. Table 1 lists the detailed network structure of MGAN-3DCNN. Correspondingly, a new cost func-tion is formulated for MGAN-3DNN. Similar to GAN, MGAN-3DCNN also aims at optimizing a minimax problem, whose objective function is expressed as

minG1:K;C

maxDJ ðG1:K; C;DÞ ¼ Ex,Pdata ½ln DðxÞ� þ Ex,Pmodel ½lnð1 � DðxÞÞ�

� βfPK

k¼1πkEx,PGk

½ln CkðxÞ�g

þγfEx,fPdata;Pmodelg½y ln y�g;

(2)

REMOTE SENSING LETTERS 985

Page 5: A mixture generative adversarial network with ... - ECE at MSU

where CkðxÞ is the probability that the sample x comes from the generator Gk , β> 0 is the diversity hyperparameter, y denotes the label of samples, y represents the estimated label, and γ is a balance parameter. The first two terms show the game between generator and discriminator. The third one is considered as the loss of softmax, aiming at maximizing the entropy of source multi-classifier. It represents the game between generator and source multi-classifier, which encourages each generator to have mode differences with other generators. The last item describes the category classification of the model, which is adopted to produce the final labels by finding the maximum probability from the output probability vector. By doing so, the generated multi- modal samples and real data can be effectively used by softmax classifier to improve the classification performance.

Since the objective function is a tripartite game which is difficult to reach Nash equilibrium by using the traditional methods, a step-by-step training strategy is designed to solve the minimax problem. In other words, the parameters in D, C and G1:K are updated alternately until the iteration termination condition is reached. In summary, the proposed training strategy mainly includes the following three steps.

Step 1: Fixing the category multi-classifier (i.e., setting γ ¼ 0), adversarial training among the generators, the discriminator, and the source multi-classifier are conducted by optimizing (2). This process is to minimize the JS divergence of generated distribu-tion and true distribution, and simultaneously maximize the JS divergence of each

Figure 1. MGAN-3DCNN architecture.

Table 1. The network structure of the MGAN-3DCNN.Operation Kernel Strides No. of Feature maps BN? BN centre? Nonlinearity Share Weight?

GðzÞ 100Fully connected 4� 4� 512 Yes No ReLU NoTransposed convolution 5� 5 2� 2 256 Yes No ReLU YesTransposed convolution 5� 5 2� 2 128 Yes No ReLU YesTransposed convolution 5� 5 2� 2 3 No No Tanh YesCðxÞ,DðxÞ 48� 48� 3Convolution 5� 5 2� 2 128 Yes Yes Leaky ReLU YesConvolution 5� 5 2� 2 256 Yes Yes Leaky ReLU YesConvolution 5� 5 2� 2 512 Yes Yes Leaky ReLU YesFully connected 10=1 No No Softmax/Sigmoid No

986 H. LI ET AL.

Page 6: A mixture generative adversarial network with ... - ECE at MSU

generator until achieving Nash equilibrium. That allows each generator to be as differ-ent as possible, but the generated distribution is consistent with the real distribution. Thus, a multi-modal and identical distribution sample generation model can be estab-lished in this step.

Step 2: Generating high-quality synthetic samples by using the established sample genera-tion model in Step 1 and marking all of the generated samples as the ðNþ 1Þ-th class, where N is the total number of categories in the original data set. In this way, these fake samples with labels can consequently be regarded as augmentation data for training in the network.

Step 3: Fixing the generators and the discriminator, training the category multi-classifier with both the generated and real samples to produce the final prediction labels for classifica-tion. In particular, the generated ðNþ 1Þ-th class synthetic-samples are added to the real samples as a supplement, which are jointly fed into the category multi-classifier for classifica-tion to improve the discriminative ability of 3DCNN under limited training samples. In addition, to make sure that the generated samples are from different generators and have the char-acteristics of multi-mode, the generated samples will be updated after the category classifier is trained for p iterations. Therefore, the category classifier can better distinguish the real data space and obtain stable classification performance.

3. Experimental results and discussion

In order to verify the effectiveness of the proposed MGAN-3DCNN, the experiments are conducted on two HSI data sets. The first data set is named ‘Indian Pines’, which was gathered by AVIRIS sensor over the Indian Pines test site in north-western Indiana in the United States in June 1992. It consists of 145� 145 pixels with 224 spectral reflectance bands in the wave-length range 400 � 2500 nm, and the spatial resolution is 20 m. By removing bands covering the region of water absorption, 200 bands are remained in this scene. Sixteen different land- cover classes are included in Indian Pines scene. Figure 2(a,b) shows the false-colour map of the Indian Pines image and the groundtruth, respectively. The second one is ‘Kennedy Space Centre (KSC)’, which was collected by AVIRIS over the KSC in Florida in United States on 23 March 1996. It consists of 512� 614 pixels with 224 spectral reflectance bands in the wavelength range 400 � 2500 nm, and the spatial resolution is 18 m. After removing water absorption and low SNR bands, 176 bands were used for the analysis. The KSC scene consists of 314,368 pixels and 13 land-cover classes. Figure 3(a,b) shows the false-colour map of the KSC image and the corresponding groundtruth, respectively.

To verify the improvement on the classification performance under small-size samples, m (which is a small number in general) samples are randomly selected from each class as training samples, and the remaining as testing samples. For Indian Pines data set, m ¼ 10. For KSC data set, m ¼ 20. To ensure the quality of generated samples and increase the calculation rate, the first three principal components are extracted for subsequent analy-sis. To capture sufficient spatial information, the neighbourhood windows of 27� 27 are used for both data sets, therefore, the size of each sample is 27� 27� 3.

In the experiments, SVM, k-NN, ACGAN, SS-DCGAN, and 3DCNN are used for comparison. The classification results are evaluated by the overall accuracy (OA), average accuracy (AA), and kappa coefficient (κ), while the average and standard deviation were calculated after 10 repeated experiments with randomly selected training samples. For the proposed MGAN- 3DCNN method, the hyperparameter combination is summarized as Table 2. To demonstrate

REMOTE SENSING LETTERS 987

Page 7: A mixture generative adversarial network with ... - ECE at MSU

the improvement of classification performance by the MGAN-3DCNN method, the parameters of 3DCNN in MGAN-3DCNN are set as the same as those in 3DCNN. The parameters of SVM on

Figure 2. Comparison of classification results on Indian Pines. (a) False-colour composite, (b) Ground truth map, obtained using (c) SVM, (d) k-NN, (e) ACGAN, (f) SS-DCGAN, (g) 3DCNN, and (h) MGAN- 3DCNN.

Figure 3. Comparison of classification results on KSC. (a) False-colour composite, (b) Ground truth map, obtained using (c) SVM, (d) k-NN, (e) ACGAN, (f) SS-DCGAN, (g) 3DCNN, and (h) MGAN-3DCNN.

988 H. LI ET AL.

Page 8: A mixture generative adversarial network with ... - ECE at MSU

the Indian Pines data set are set to RBF kernel function, balanced class-weight pattern, the kernel function parameter γ is 0.02, the lower limit percentage of the support vector is 0.01, and the error tolerance of the stop training is 0.0001; on the KSC data set are set to RBF kernel function, balanced class-weight pattern, the kernel function parameter gamma is 0.1, the lower limit percentage of the support vector is 0.35, and the error tolerance of the stop training is 0.0001. The value of K in the k-NN is chosen from the set f1; 2; 3; 4; 5; 6; 7; 23; 24g. The generator in ACGAN is a 4-layer 2D deconvolution network, and the sizes of the convolution core are set as 128� 3� 3, 64� 3� 3, 10� 3� 3, and 10� 2� 2, respectively. The dis-criminator is a 3D convolution network with the convolution core size set as 32� 5� 5� 1, 64� 3� 3� 1, and 128� 3� 3� 3. In SS-DCGAN, the generator is composed by 4-layer 2D deconvolution network and their convolution core sizes are set as 512� 5� 5, 256� 5� 5, 128� 5� 5, and 5� 4� 4, respectively. The discriminator is a 3D deep ResNet that has two residual blocks and seven convolutional layers, and the sizes of their convolution cores are set as 128� 5� 5� 5, 128� 5� 5� 5, 128� 5� 5� 5, 128� 5� 5� 5, 128� 3� 3� 1, 128� 3� 3� 1, and 128� 3� 3� 1. The 3DCNN method uses a two-layer 3D convolutional neural network on both data sets. The first layer is 32� 7� 7� 3 convolution kernels, and the second layer is 64� 3� 3� 1 convolution kernels. The outputs of the pooling layer and the fully connected layer set the dropout rate as 0.25 and 0.5, respectively. For the Indian Pines data set, the batch-size is set as 160; for the KSC data set, it is set as 130.

Tables 3 and 4 show the classification results of the Indian Pines data set and the KSC data set, respectively. It is obvious that the OA, AA, and κ obtained by MGAN-3DCNN are higher than those of SVM, k-NN, ACGAN, SS-DCGAN, and 3DCNN. Compared with 3DCNN, the classification performance and stability of MGAN-3DCNN are significantly improved by using the MGAN with category multi-classifier. Especially for the KSC data set, because of its sparse sample distribution and complex spatial information, it is easier to have the over-fitting problem in the case of small-size samples. MGAN-3DCNN still performs better than five comparative methods, which demonstrates that the proposed classification framework takes more intrinsic information of data into consideration. In general, the classification for KSC is a more difficult task since the KSC data set has small homogeneous areas. That is the reason why the classification performance of Indian Pines is better than that of KSC in most cases. However, benefitting from the strong discriminability of 3DCNN which can eliminate the influence caused by the heterogeneity, 3DCNN as well as MGAN-3DCNN achieves close classification performance on both data sets. Furthermore, due to MGAN-3DCNN makes the

Table 2. The hyperparameter setting of MGAN-3DCNN for the Indian Pines and KSC data sets.Hyperparameters Indian Pines KSC

Noise Dimension 100 100Noise Prior Distribution Normal NormalMulti-classifier coefficients 1.0 1.0Number of generators 16 13Generator Network Layer 3 3Generator Batch Size 5 5Number of Generator Feature 128 128Discriminator Network Layer 3 3Discriminator Batch Size 80 65Number of Discriminator Feature 128 128Real samples: Generated samples 1 2Update Frequency of Generated Samples Every 100 iterations Every 100 iterationsLearning Rate 0.0002 0.0002

REMOTE SENSING LETTERS 989

Page 9: A mixture generative adversarial network with ... - ECE at MSU

generated samples more similar to the real ones with a control on the sample generation, it shows better classification performance. On the contrary, the sample generation in ACGAN and SS-DCGAN without the special control may result in negative impact when the data have weak homogeneity.

Figures 2 and 3 show the classification maps of the Indian Pines data set and the KSC data set obtained by all considered methods, respectively. Intuitively, it can be seen that the classification map obtained by MGAN-3DCNN is smooth overall. In particular, there are fewer misclassified pixels and the classification map is closer to the true distribution of real distribution of ground objects.

Table 3. Classification accuracy of the Indian Pines data set obtained by using different methods (m=10).

Method

Class # SVM k-NN ACGAN SS-DCGAN 3DCNN MGAN-3DCNN

1 95.83� 4.17 97.92� 1.39 97.92� 2.66 100� 0.00 98.61� 2.78 98.61� 1.60

2 54.97� 7.01 38.95� 6.19 55.13� 8.60 45.77� 10.57 55.36� 9.31 52.13� 9.203 61.31� 5.05 52.9� 4.30 62.04� 14.96 53.26� 11.77 64.02� 13.43 66.71� 7.654 78.41� 4.45 73.68� 9.79 74.78� 5.10 81.50� 8.22 75.99� 11.06 71.15� 11.675 68.5� 6.13 67.49� 5.34 70.3� 8.91 53.75� 5.38 72.67� 7.61 70.24� 7.766 71.04� 2.29 78.44� 4.29 70.03� 4.31 73.68� 9.98 68.58� 3.87 66.35� 2.607 100� 0.00 100� 0.00 100� 0.00 100� 0.00 100� 0.00 100� 0.008 94.44� 4.95 96.96� 4.42 93.48� 3.58 100� 0.00 95.73� 6.08 96.58� 4.019 100� 0.00 100� 0.00 100� 0.00 100� 0.00 100� 0.00 100� 0.0010 64.79� 2.75 65.23� 4.5 67.62� 4.50 68.69� 11.46 70.11� 2.51 69.36� 4.9311 63.18� 5.43 65.55� 8.62 53.42� 8.94 68.66� 8.02 63.45� 2.02 70.59� 2.8712 75.69� 7.14 48.28� 6.39 62.78� 8.65 61.15� 10.72 70.33� 4.76 77.14� 3.3613 94.49� 2.77 96.15� 2.95 98.21� 1.07 98.72� 1.38 96.28� 1.75 97.95� 1.8314 81.65� 5.00 74.74� 7.30 90.04� 6.12 84.88� 4.01 86.04� 5.37 85.38� 5.7415 90.29� 2.85 85.37� 5.44 90.36� 7.68 84.11� 2.82 86.17� 4.32 83.71� 2.9016 90.66� 8.75 91.57� 9.74 100� 0.00 98.8� 0.85 99.1� 1.15 99.4� 1.20

OA(%) 69.71� 3.17 65.3� 2.13 68.04� 2.99 68.78� 1.48 70.78� 2.89 72.12� 0.65AA(%) 80.33� 1.47 77.08� 0.77 80.38� 1.83 79.56� 1.96 81.4� 1.56 81.58� 0.27κ � 100 65.84� 3.57 60.66� 2.14 64.04� 3.37 64.70� 1.83 67.17� 3.20 68.51� 0.69

Table 4. Classification accuracy of the KSC data set obtained by using different methods (m=20).Method

Class # SVM k-NN ACGAN SS-DCGAN 3DCNN MGAN-3DCNN

1 85.83� 1.44 96.52� 1.05 68.56� 45.62 11.67� 22.54 79.79� 4.81 82.12� 2.17

2 30.49� 7.91 9.53� 2.12 65.7� 3.46 61.66� 9.48 70.85� 1.90 75.45� 2.083 39.51� 3.89 71.82� 5.59 28.5� 4.89 37.50� 7.20 66.42� 4.53 75.53� 1.744 11.53� 5.43 15.09� 4.60 55.71� 3.65 57.33� 3.81 56.79� 6.53 63.47� 5.245 81.74� 2.61 36.17� 3.28 96.28� 2.34 94.68� 7.95 96.99� 2.27 94.33� 5.406 55.86� 3.71 54.19� 0.46 50.36� 20.34 76.44� 20.91 87.2� 3.93 87.2� 0.997 0.00� 0.00 2.65� 2.43 81.18� 18.45 87.94� 7.20 82.94� 4.56 88.82� 3.668 64.78� 6.00 46.17� 1.09 18.31� 3.52 43.61� 20.84 59.12� 10.35 61.5� 6.349 47.65� 1.55 91.3� 2.07 31.7� 2.76 78.15� 21.49 66.35� 9.48 73.6� 10.6810 67.84� 4.69 7.16� 3.78 72.92� 7.06 73.31� 12.41 69.73� 3.64 72.07� 6.4411 45.74� 7.10 0.44� 0.52 84.9� 3.50 88.41� 1.67 71.49� 2.51 70.18� 1.2412 63.61� 1.74 0.00� 0.00 81.68� 6.67 71.01� 13.14 78.05� 10.17 77.43� 10.713 41.12� 0.27 99.97� 0.06 77.01� 3.40 93.58� 5.02 63.59� 6.95 62.35� 8.49

OA (%) 54.21� 1.37 54.33� 0.19 62.74� 6.26 64.83� 4.89 70.87� 0.78 72.91� 1.06AA (%) 48.9� 1.75 40.85� 0.10 62.52� 3.00 67.33� 4.03 73.02� 0.62 75.7� 1.10κ � 100 49.43� 1.54 48.74� 0.28 58.41� 6.28 61.02� 5.38 67.78� 0.82 70.06� 1.13

990 H. LI ET AL.

Page 10: A mixture generative adversarial network with ... - ECE at MSU

For the Indian Pines data set, MGAN-3DCNN outperforms other five methods in the classification performance of class 11 according to Table 3 and Figure 2. Overall, MGAN- 3DCNN offers a larger average OA. As shown in Table 4 and Figure 3, the classification performance of the KSC data set obtained by MGAN-3DCNN is superior to that of five comparative methods in class 2, class 3, and class 7. In addition, k-NN achieves the best classification performance when the number of samples is large, such as class 1, class 9, and class 13.

4. Conclusions

In this letter, a novel MGAN-based HSI classification method has been proposed. Specifically, through the adversarial training of the multi-generators and the discrimi-nator of the MGAN, multi-modality and identically distributed samples are generated by MGAN, and the problem of mode collapse has been effectively alleviated. To solve the problem that MGAN cannot be used classification directly, the MGAN-3DCNN framework integrates MGAN with a category multi-classifier for classification task. Accordingly, a new loss function was constructed and a step-by-step adjustment strategy was also designed to accelerate the convergence of the loss. Experiments on two HSI data sets were conducted to verify the superior performance of the proposed MGAN-3DCNN method.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 61871335.

ORCID

Fan Zhang http://orcid.org/0000-0002-2058-2373

References

Chen, Y., Z. Lin, X. Zhao, G. Wang, and Y. Gu. 2014. “Deep Learning-based Classification of Hyperspectral Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (6): 2094–2107. doi:10.1109/JSTARS.2014.2329330.

Cover, T. M., and P. E. Hart. 1967. “Nearest Neighbour Pattern Classification.” IEEE Transactions on Information Theory 13 (1): 21–27. doi:10.1109/TIT.1967.1053964.

Deng, Y.-J., H.-C. Li, X. Song, Y.-J. Sun, X.-R. Zhang, and Q. Du. 2020. “Patch Tensor-based Multigraph Embedding Framework for Dimensionality Reduction of Hyperspectral Images.” IEEE Transactions on Geoscience and Remote Sensing 58 (3): 1630–1643. doi:10.1109/TGRS.2019.2947200.

Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. “Generative Adversarial Nets.” Advances in Neural Information Processing Systems. http:// papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

Ham, J., Y. Chen, M. M. Crawford, and J. Ghosh. 2005. “Investigation of the Random Forest Framework for Classification of Hyperspectral Data.” IEEE Transactions on Geoscience and Remote Sensing 43 (3): 492–501. doi:10.1109/TGRS.2004.842481.

REMOTE SENSING LETTERS 991

Page 11: A mixture generative adversarial network with ... - ECE at MSU

Hoang, Q., T. D. Nguyen, T. Le, and D. Phung. 2018. “MGAN: Training Generative Adversarial Nets with Multiple Generators.” International Conference on Learning Representations. https://openre view.net/forum?id=rkmu5b0a-

LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. doi:10.1109/5.726791.

Li, H.-C., H. Zhou, L. Pan, and Q. Du. 2018. “Gabor Feature-based Composite Kernel Method for Hyperspectral Image Classification.” Electronics Letters 54 (10): 628–630. doi:10.1049/el.2018.0272.

Li, T., J. Zhang, and Y. Zhang. 2014. “Classification of Hyperspectral Image Based on Deep Belief Networks.” In IEEE International Conference on Image Processing. doi:10.1109/ICIP.2014.7026039.

Li, Y., H. Zhang, and Q. Shen. 2017. “Spectral-spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network.” Remote Sensing 9 (1): 67. doi:10.3390/rs9010067.

Melgani, F., and L. Bruzzone. 2004. “Classification of Hyperspectral Remote Sensing Images with Support Vector Machines.” IEEE Transactions on Geoscience and Remote Sensing 42 (8): 1778–1790. doi:10.1109/TGRS.2004.831865.

Sun, J., X. Liu, W. Wan, J. Li, D. Zhao, and H. Zhang. 2016. “Video Hashing Based on Appearance and Attention Features Fusion via DBN.” Neurocomputing 213: 84–94. doi:10.1016/j. neucom.2016.05.098.

Wang, D., B. Du, L. Zhang, and S. Chu. 2020. “Hyperspectral Image Classification Based on Multi-scale Information Compensation.” Remote Sensing Letters 11 (3): 293–302. doi:10.1080/ 2150704X.2019.1711238.

Xue, Z. 2020. “Semi-supervised Convolutional Generative Adversarial Network for Hyperspectral Image Classification.” IET Image Processing 14 (4): 709–719. doi:10.1049/iet-ipr.2019.0869.

Yang, J., Y. Zhao, J. C. W. Chan, and C. Yi. 2016. “Hyperspectral Image Classification Using Two-channel Deep Convolutional Neural Network.” In IEEE International Geoscience and Remote Sensing Symposium. doi:10.1109/IGARSS.2016.7730324.

Zhao, W., and S. Du. 2016. “Spectral-spatial Feature Extraction for Hyperspectral Image Classification: A Dimension Reduction and Deep Learning Approach.” IEEE Transactions on Geoscience and Remote Sensing 54 (8): 4544–4554. doi:10.1109/TGRS.2016.2543748.

Zhao, X., Y. Liang, A. J. X. Guo, and F. Zhu. 2020. “Classification of Small-scale Hyperspectral Images with Multi-source Deep Transfer Learning.” Remote Sensing Letters 11 (4): 303–312. doi:10.1080/ 2150704X.2020.1714772.

Zhu, L., Y. Chen, P. Ghamisi, and J. A. Benediktsson. 2018. “Generative Adversarial Networks for Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 56 (9): 5046–5063. doi:10.1109/TGRS.2018.2805286.

992 H. LI ET AL.