Literature DB >> 33260347

GhoMR: Multi-Receptive Lightweight Residual Modules for Hyperspectral Classification.

Arijit Das¹, Indrajit Saha², Rafał Scherer³.

Abstract

In recent years, hyperspectral images (HSIs) have attained considerable attention in computer vision (CV) due to their wide utility in remote sensing. Unlike images with three or lesser channels, HSIs have a large number of spectral bands. Recent works demonstrate the use of modern deep learning based CV techniques like convolutional neural networks (CNNs) for analyzing HSI. CNNs have receptive fields (RFs) fueled by learnable weights, which are trained to extract useful features from images. In this work, a novel multi-receptive CNN module called GhoMR is proposed for HSI classification. GhoMR utilizes blocks containing several RFs, extracting features in a residual fashion. Each RF extracts features which are used by other RFs to extract more complex features in a hierarchical manner. However, the higher the number of RFs, the greater the associated weights, thus heavier is the network. Most complex architectures suffer from this shortcoming. To tackle this, the recently found Ghost module is used as the basic building unit. Ghost modules address the feature redundancy in CNNs by extracting only limited features and performing cheap transformations on them, thus reducing the overall parameters in the network. To test the discriminative potential of GhoMR, a simple network called GhoMR-Net is constructed using GhoMR modules, and experiments are performed on three public HSI data sets-Indian Pines, University of Pavia, and Salinas Scene. The classification performance is measured using three metrics-overall accuracy (OA), Kappa coefficient (Kappa), and average accuracy (AA). Comparisons with ten state-of-the-art architectures are shown to demonstrate the effectiveness of the method further. Although lightweight, the proposed GhoMR-Net provides comparable or better performance than other networks. The PyTorch code for this study is made available at the iamarijit/GhoMR GitHub repository.

Entities: Chemical Disease Species

Keywords: convolutional neural network; deep learning; feature extraction; hyperspectral image classification; multi-receptive module; remote sensing

Year: 2020 PMID： 33260347 PMCID： PMC7729750 DOI： 10.3390/s20236823

Source DB: PubMed Journal: Sensors (Basel) ISSN： 1424-8220 Impact factor: 3.576

1. Introduction

Hyperspectral images (HSIs) are image cubes where each pixel is measured as one near-continuous spectrum. Unlike RGB images, HSIs have hundreds of spectral bands, containing knowledge regarding wavelengths beyond the visible spectrum. These cubes contain both spatial and spectral information, which can be widely utilized in remote sensing for analyzing a scene of interest. Hyperspectral imaging also finds its applications in agriculture [1], forestry [2,3], archaeology [4], medical analysis [5], food quality control [6], military defense [7], forensics [8], and several other domains as well. Thus, research in HSI processing and analysis is growing rapidly, and several studies have been published in past years for the same. Often, the high spectral dimensionality of an HSI poses a challenge in the analysis due to noise and high computation costs. Earlier, algorithms like independent component analysis (ICA) [9], principal component analysis (PCA) [10], and linear discriminant analysis (LDA) [11] were used to deal with this. Recently, more advanced dimension reduction techniques [12,13,14] and band selection methods [15,16,17] have been found to address the same. An HSI is also subject to mixed pixels, i.e., a pixel can contain mixtures of spectra from different components (also called endmembers). This occurs either due to the low spatial resolution of the sensors or due to multiple scattering and intimate mixing effects. Thus, spectral unmixing is done, which involves retrieving all or some of the endmembers and estimating their fractional abundances in each of the mixed pixels. In recent years, several techniques [18,19,20] have been proposed, which have shown satisfactory results in hyperspectral unmixing. Similarly, HSI classification is another widely-concerned task in hyperspectral imaging, which this manuscript addresses. HSI classification is the process of assigning a class for every pixel in an image, based on its spectral and spatial features. Early researches on HSI classification mostly focused on utilizing shallow hand-crafted techniques [21,22]. Some of these techniques [23] utilize local covariance matrix representation to extract the correlation between the spectral bands, which are then used by machine learning algorithms, like support vector machine (SVM) [24] for HSI classification. Along with spectral methods, spatial feature extraction techniques like mathematical morphological transformations [25] and composite kernel learning [26,27] are also used. 3D wavelets [28] and 3D Gabor filters [29] are also efficient methods for extracting spatial features from HSIs. Other techniques [30,31,32] involving sparse representations are also developed to exploit the spatial contextual knowledge in HSIs. Although the methodologies discussed above have effectively addressed HSI classification, they are capable of extracting only a limited set of features, deficient in useful information. This limitation has inspired deep learning computer vision (CV) algorithms to replace these shallow hand-engineered techniques. This evolution is discussed in details in a recently published comparative study [33] between the shallow techniques and learning-based algorithms. Convolutional neural network (CNN) is one of the widely used deep learning algorithms for HSI classification. A CNN is driven by receptive fields (RFs), which use trainable filters to extract features from HSIs. These filters have randomly initialized weights, which automatically update while training to extract necessary information. This self-learning potential gives CNN robustness and superior discriminative ability than shallow methods to distinguish between various HSI pixels. Besides HSI classification, CNN architectures proposed in recent years have also revolutionized other domains of CV. AlexNet [34], proposed in 2012, is one of the founding architectures for image classification on the ImageNet [35] dataset. Several architectures like VGGNet [36], GoogleNet [37], ResNet [38], DenseNet [39] and SENet [40] followed. Methods have been proposed to tackle other CV tasks—R-CNN [41], fast R-CNN [42], faster R-CNN [43], YOLO [44] and SSD [45] for object detection, mask R-CNN [46], SegNet [47], FCN [48] and U-Net [49] for image segmentation, RCCNet [50] for colon cancer classification, etc. For HSI analysis, several CNN-driven architectures are proposed in recent years. Some simple networks use 2D-CNN [51] and 3D-CNN [52]. Other networks like deformable CNN [53], super-resolution-aided CNN [54] and Two-CNN [55] use variations of 2D-CNN, while multi-scale 3D-CNN (M3D-CNN) [56], 3D-LWNet [57] and spectral-spatial residual network (SSRN) [58] use 3D-CNN-based approaches. HybridSN [59], another state-of-the-art architecture, uses a sequential fusion of both 2D and 3D CNNs to extract joint spectral-spatial information. Dual-path network (DPNet) [60], convolutional feature fusion network [61] and deep feature fusion network [62] are other fusion-based strategies for HSI classification. FuSENet [63], which uses squeeze-and-excitation modules [40], applies fusion within a single residual block. Unlike SENet, which uses global average pooling (GAP) for squeeze operation, FuSENet uses a fusion of GAP and global max-pooling (GMP) for the same. Although these methods have excelled tremendously in HSI classification, they have fairly heavy architectures, owing to a large number of trainable parameters. Since CNNs are significantly machine-dependent, these architectures require expensive GPUs and hardware to train and store them. The above shortcoming in earlier works inspired us to propose the multi-receptive lightweight residual block called GhoMR. A singular GhoMR uses a complex strategy inspired by Res2Net [64] to extract information from HSI data. Each module contains multiple RFs, where each RF extracts features in a hierarchical fashion using information from other RFs in the same module. These RFs are connected with residual-like connections. However, with an increase in complexity, the number of learnable weights increases. Thus, to ensure a lightweight architecture, the Ghost module (GM) is used as the basic building unit. A single receptive layer of a CNN has multiple convolutional kernels which generate several feature maps. Research has shown [65] that many of these feature maps are similar and can be easily constructed by transforming other features. GMs take advantage of this feature redundancy in CNNs. Inside a GM, a very limited number of features are extracted from the input using a convolutional layer. Then, more features are generated from the existing ones using cheap linear operations on them. This strategy reduces the number of parameters, giving rise to a lightweight feature extraction module. The GM was first used in GhostNet [65], published in CVPR 2020, and later it became a backbone for many methods. Recently, an architecture based on GM called Improved GhostNet [66] was used for remote sensing classification as well. However, the proposed GhoMR is the first to use GM on HSIs. Stacking four such GhoMR modules, a classification network called GhoMR-Net is constructed, which is tested on three benchmark datasets and compared with state-of-the-art architectures. The main contributions of this research can be summarized as follows: A novel lightweight multi-receptive feature extraction module called GhoMR is proposed for HSI classification, A GhoMR utilizes complex feature extraction strategy using several internal RFs, connected in a residual fashion, To reduce the number of trainable parameters, Ghost modules are used, which uses low-cost transformations to address feature redundancy in CNNs, An architecture called GhoMR-Net is designed using multiple GhoMR blocks to perform experiments on three public HSI datasets, Comparisons are shown, which verifies that the proposed GhoMR gives better or comparable results than state-of-the-art techniques. The rest of the paper is organized as follows. Section 2 describes the proposed methodology, Section 3 describes the datasets used and discusses the experiments, comparisons, and visualizations performed on them, while Section 4 concludes our research.

2. Methodology

2.1. Brief Description of Ghost Modules

CNNs are driven by receptive kernels or filters having randomly initialized weights. These kernels traverse an input (image or feature maps) and perform element-wise multiplication with underlying pixels, followed by summation to extract features. This operation is termed as convolution. During training, sufficient examples are fed, and along with many iterations, these weights are updated using backpropagation, as the network learns to generalize over unseen examples. However, CNN architectures use several kernels to extract a wide variety of feature maps. This increases the cardinality of trainable weights, thus demanding heavy computational costs and expensive hardware to train and store them. Let be the input to a single convolutional block, where W and H are the spatial dimensions, while C is the number of channels. To extract a unique feature map from I, a kernel is used to perform the convolution, where and . The convolution operation can be represented as Similarly, a set of kernels is used to generate different feature maps, which are stacked to produce a feature block , which becomes the input for another set of kernels. This total operation involves number of parameters, which can be as large as hundreds or thousands, owing to large values of C and . Thus, to reduce parameters, the number of kernels, must be optimized (assuming that C is constant). Prior research has shown that many feature maps derived by these kernels are similar to each other. So, these can be generated by mutating the existing ones, rather than using separate kernels. To exploit this redundancy, the Ghost module (GM) [65] was recently invented. A GM reduces the cardinality of kernels while keeping a minimal loss of information at the same time. Feature extraction in a GM is done in two steps: The first step involves simple convolutional operations as described above. Keeping all hyper-parameters constant, kernels are used to generate a set of intrinsic feature maps , where . As a result, the total number of parameters in the network reduces to . The reduction of parameters leads to the loss of significant information. To make up for the remaining features, new feature maps are derived from each of the existing features by performing T low-cost operations (Ghost transformations) on them. These derived features are called Ghost features. This equation can be represented as where is the ith feature map in and is the jth linear operation deriving a Ghost feature from . Thus, and . Among the T Ghost transformations applied on , one operation is kept as identity operation to retain the original feature map. The remaining operations generates the ghost features. Thus, now a total of features are generated, such that . Figure 1 shows a simple illustration of the Ghost module. For the transformation function , convolutional filters of size are used instead of hand-crafted low-cost linear operations. These filters are called Ghost filters. This is done to utilize the learning capability of convolution operation to perform the most appropriate transformations. Moreover, it gives the flexibility to experiment with different values for , since the kernels of different spatial dimensions extract different types of features. Note that the computational complexity of is much less than ordinary convolution, a detailed analysis of which is given in the founding manuscript [65].

Figure 1

An illustration of the Ghost module.

2.2. GhoMR—Proposed Multi-Receptive Module for HSI Classification

Figure 2 shows the diagram of a single GhoMR module, which is the proposed backbone for HSI classification. A GhoMR uses multiple internal GMs to extract features in a residual hierarchical fashion. This strategy is inspired by Res2Net [64] and is useful for extracting complex details from the HSI cube. Let the input for an arbitrary GhoMR module be , where W, H, and C are the width, height, and channels respectively. Feature extraction from this cube is done in three steps:

Figure 2

Proposed GhoMR module.

At first, a GM using kernels is used to extract the feature block . Note, these kernels are not the Ghost filters, but are used to generate the original feature maps. For the Ghost filters, experiments with different sizes () are performed, which is discussed in Section 3. In the next step, the N feature maps of are split into four subsets, denoted by , where . Except , each subset is passed through a GM. The output of the previous GM, is fused hierarchically using element-wise summation with the current subset , to produce the set of features . The equations supporting this operation are where + refers to element-wise summation. Note, the GM for the first split is omitted in order to reuse features and reduce parameters in the module. Finally, the output maps , , and , are concatenated on their depth to form a singular feature block containing all the information. This is further passed through a GM and fused with input I through a residual connection to produce the final output O. This operation is expressed as where ⊕ refers to concatenation and + denotes element-wise summation.

3. Experiments and Discussion

3.1. Datasets

The proposed methodology is evaluated on three public HSI datasets (http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes). The description of these datasets are given as follows: Indian Pines (IP)—The images in this dataset were collected in 1992, over the Indian Pines test site in north-western Indiana using the AVIRIS [67] sensor. The HSI cube has a spatial dimension of pixels with 224 spectral bands in the wavelength range of 400 to 2500 nm, among which 24 bands corresponding to regions of water absorption were eliminated. Among the pixels, are annotated with ground truth from a set of 16 different vegetation classes. University of Pavia (UP)—This dataset was acquired in 2001, over the university campus at Pavia, Northern Italy, using the ROSIS sensor. It has a spatial dimension of pixels and 103 spectral bands in wavelength between 430 to 860 nm. The ground truth is a set of 9 urban land-cover classes, and approx. of the total pixels are annotated with this information. Salinas Scene (SA)—This dataset was collected over Salinas Valley, California, in 1998 using the AVRIS sensor. The spatial dimension is pixels and the spectral information is encoded in 224 bands with a wavelength in the range of 360 to 2500 nm. Similar to IP, 20 spectral bands due to water absorption are discarded. The ground truth contains 16 different classes from vegetables, bare soils, and vineyard fields.

3.2. Experimental Protocols

Using several GhoMRs, a network called GhoMR-Net is proposed as shown in Figure 3. At first, the input is fed to a simple convolutional layer of 24 kernels. The output is then passed through a series of four GhoMR modules, which produces 24, 36, 48, and 60 feature maps, respectively. Inside each GhoMR, the first GM generates 48 feature maps from the input, which is split into four parts, having 12 features each. The GMs operating on each split () extract 12 feature maps, which are concatenated again into a single block of size 48. This block is fed to the final GM, which outputs the set of features for the next GhoMR block. To increase the efficiency, after every GM batch-normalization [68] and ReLU activation is used. On the extracted features from the final GhoMR, global average pooling (GAP) [69] is performed and the resulting vector is fed to a fully-connected (FC) layer to output the class probabilities. The class with the maximum probability is the predicted class.

Figure 3

GhoMR-Net−Proposed HSI classification network.

The above architecture is trained to classify each pixel of an HSI cube . This 3D image cube has hundreds of spectral channels, containing redundant information. This makes classification difficult and increases computational costs. Thus, principal component analysis (PCA) is performed along the spectral axis. This PCA-reduced cube retains the spatial information and reduces the channels to S, where S is 30 for IP, and 15 for SA and UP respectively. Now, is divided into spatially overlapping 3D patches , where W is the spatial dimension of a patch. The ground-truth assigned to each patch is the same as that of the central pixel in the patch. These 3D patches are fed to the proposed GhoMR-Net, which outputs a vector , where is the number of classes. The cross-entropy loss is then calculated between and and the network is trained to minimize this loss. As discussed in Section 2, the GMs used in the GhoMR blocks have two hyperparameters—number of Ghost transformations (T) and spatial size of ghost filters (). With an increase in T, less raw features are extracted from the input, and more are derived using Ghost operations, thus reducing the number of parameters. While a larger value of means a greater filter dimension, thus increasing trainable parameters in the network. Performance with different combinations of T and are discussed in the next subsection. Experiments with different spatial sizes (W) of input patches and different training ratios are also discussed. All the experiments are done using PyTorch 1.6.0 with CUDA in the GPU environment of Google Colaboratory. The architecture is trained using Adam [70] optimizer for 100 epochs, keeping a batch size of 100 and a learning rate of 0.001. The code for this research is available at https://github.com/iamarijit/GhoMR. To measure the performance, three standard evaluation metrics are used—overall accuracy (OA), average accuracy (AA), and Kappa coefficient. OA measures the total number of samples correctly classified in the test set, AA calculates the average of the class-wise accuracies and Kappa measures the degree of agreement between the ground-truth and predicted classification map. The OA, AA, and Kappa for each experiment are calculated five times and are written as mean ± std. Based on these metrics and the above-mentioned hyperparameters, five sets of analysis are carried out to demonstrate the classification potential and lightweight nature of the proposed GhoMR-Net: First experiment calculates the class-wise accuracies, OA, AA, and Kappa for IP, UP, and SA datasets using and training data. The 3D spectral-spatial inputs have spatial dimensions for all three datasets. The value of T and are kept 2 and 3 respectively. In the second experiment, OA, AA, and Kappa are measured on the three datasets for different values of T and , such that and . A comparative study between all the six combinations of T and is performed. This experiment is conducted on 10% training data with 3D input cubes of spatial dimension . In the third experiment, the proposed architecture is compared with the following state-of-the-art techniques—SVM [24], 2D-CNN [51], 3D-CNN [52], M3D-CNN [56], Two-CNN [55], SSRN [58], HybridSN [59], SENet [63] (with global average pooling and max pooling) and FuSENet [63]. Comparisons are shown for both and training data, keeping input spatial dimension of . The fourth experiment measures the OA, AA, and Kappa on lesser training data ( and ) and smaller spatial dimensions ( and ) of input patches. The parameters T and are kept 2 and 3 respectively. The final experiment demonstrates the effectiveness of GhoMR-Net using t-SNE visualization [71] and confusion matrices. Moreover, the number of trainable parameters in the network is compared with other state-of-the-art architectures.

3.3. Classification Results and Visualizations

The first experiment was conducted to calculate the class-wise accuracies for the three datasets, using hyperspectral inputs of spatial dimension . The results are shown in Table 1 and Table 2 for and training data, respectively. For each dataset, the first three columns contain class labels and data distribution (training and test samples), while the fourth column shows the accuracy (in percent %) for each class. The last four rows of the table represent the overall accuracy (OA), Kappa coefficient, average accuracy (AA), and training time for each experiment. For training data, the OAs obtained are , and , while on data, it is , and for IP, UP and SA, respectively. On IP, the proposed GhoMR-Net performs worse than SA and UP, which can be explained by fewer training examples and significant imbalance among the classes. To better understand the results, the ground-truth and predicted classification maps for IP, UP and SA are shown in Figure 4, Figure 5 and Figure 6, respectively.

Table 1

Data distribution along with class-wise accuracies, OAs, Kappas, AAs and training time on IP, UP and SA datasets, respectively, for training data.

IP				UP				SA
Name	Training	Test	Accuracy	Name	Training	Test	Accuracy	Name	Training	Test	Accuracy
Alfalfa	9	37	100±0.0	Asphalt	1326	5305	100±0.0	Brocoli_green_weeds_1	402	1607	100±0.0
Corn-notill	285	1143	98.81±0.3	Meadows	3730	14,919	100±0.0	Brocoli_green_weeds_2	745	2981	100±0.0
Corn-mintill	166	664	99.70±0.2	Gravel	420	1679	99.96±0.0	Fallow	395	1581	100±0.0
Corn	47	190	100±0.0	Trees	613	2451	99.00±0.2	Fallow_rough_plow	279	1115	99.98±0.0
Grass-pasture	97	386	99.79±0.2	Painted metal sheets	269	1076	99.93±0.1	Fallow_smooth	536	2142	99.86±0.2
Grass-trees	146	584	99.66±0.1	Bare Soil	1006	4023	100±0.0	Stubble	792	3167	100±0.0
Grass-pasture-mowed	6	22	100±0.0	Bitumen	266	1064	100±0.0	Celery	716	2863	100±0.0
Hay-windrowed	96	382	100±0.0	Self-Blocking Bricks	736	2946	99.72±0.1	Grapes_untrained	2254	9017	100±0.0
Oats	4	16	97.50±3.1	Shadows	189	758	99.82±0.1	Soil_vinyard_develop	1240	4963	100±0.0
Soybean-notill	194	778	99.54±0.2					Corn_senesced_green_weeds	656	2622	100±0.0
Soybean-mintill	491	1964	99.80±0.1					Lettuce_romaine_4wk	214	854	100±0.0
Soybean-clean	118	475	98.27±0.5					Lettuce_romaine_5wk	385	1542	100±0.0
Wheat	41	164	99.88±0.2					Lettuce_romaine_6wk	183	733	100±0.0
Woods	253	1012	100±0.0					Lettuce_romaine_7wk	214	856	100±0.0
Buildings-Grass-Trees-Drives	77	309	99.94±0.1					Vinyard_untrained	1453	5815	100±0.0
Stone-Steel-Towers	19	74	95.95±0.0					Vinyard_vertical_trellis	361	1446	100±0.0
OA	2049	8200	99.54±0.0	OA	8555	34,221	99.90±0.0	OA	10,825	43,304	99.99±0.0
Kappa			99.47±0.0	Kappa			99.86±0.0	Kappa			99.99±0.0
AA			99.30±0.2	AA			99.82±0.0	AA			99.99±0.0
Training time	3 min 34 s			Training time	13 min 50 s			Training time	17 min 52 s

Table 2

Data distribution along with class-wise accuracies, OAs, Kappas, AAs and training time on IP, UP and SA datasets respectively for training data.

IP				UP				SA
Name	Training	Test	Accuracy	Name	Training	Test	Accuracy	Name	Training	Test	Accuracy
Alfalfa	5	41	98.54±2.0	Asphalt	663	5968	100±0.0	Brocoli_green_weeds_1	201	1808	100±0.0
Corn-notill	143	1285	96.45±0.8	Meadows	1865	16,784	100±0.0	Brocoli_green_weeds_2	372	3354	100±0.0
Corn-mintill	83	747	99.46±0.4	Gravel	210	1889	99.63±0.2	Fallow	197	1779	100±0.0
Corn	24	213	99.53±0.3	Trees	306	2758	98.61±0.2	Fallow_rough_plow	139	1255	99.97±0.1
Grass-pasture	48	435	99.54±0.3	Painted metal sheets	134	1211	99.9±0.1	Fallow_smooth	268	2410	99.85±0.2
Grass-trees	73	657	99.24±0.4	Bare Soil	503	4526	100±0.0	Stubble	396	3563	99.99±0.0
Grass-pasture-mowed	3	25	100±0.0	Bitumen	133	1197	100±0.0	Celery	358	3221	99.93±0.1
Hay-windrowed	48	430	100±0.0	Self-Blocking Bricks	368	3314	99.47±0.2	Grapes_untrained	1127	10,144	100±0.0
Oats	2	18	90.00±12.4	Shadows	95	852	96.38±0.6	Soil_vinyard_develop	620	5583	100±0.0
Soybean-notill	97	875	98.08±0.8					Corn_senesced_green_weeds	328	2950	100±0.0
Soybean-mintill	245	2210	99.28±0.2					Lettuce_romaine_4wk	107	961	100±0.0
Soybean-clean	59	534	95.73±3.0					Lettuce_romaine_5wk	193	1734	100±0.0
Wheat	20	185	99.46±0.5					Lettuce_romaine_6wk	91	825	100±0.0
Woods	126	1139	100±0.0					Lettuce_romaine_7wk	107	963	100±0.0
Buildings-Grass-Trees-Drives	39	347	98.90±0.9					Vinyard_untrained	727	6541	100±0.0
Stone-Steel-Towers	9	84	93.81±5.5					Vinyard_vertical_trellis	181	1626	100±0.0
OA	1024	9225	98.64±0.2	OA	4277	38,499	99.75±0.0	OA	5412	48,717	99.98±0.0
Kappa			98.45±0.3	Kappa			99.67±0.0	Kappa			99.98±0.0
AA			98.00±0.8	AA			99.33±0.1	AA			99.98±0.0
Training time	2 min 58 s			Training time	11 min 20 s			Training time	14 min 20 s

Figure 4

Classification maps for IP (a) False color image (b) Ground-Truth (c,d) Predicted maps for and training data, respectively.

Figure 5

Classification maps for UP (a) False color image (b) Ground-Truth (c,d) Predicted maps for and training data, respectively.

Figure 6

Classification maps for SA (a) False color image (b) Ground-Truth (c,d) Predicted maps for and training data, respectively.

In the second set of experiments, the dependence on the hyperparameters T and is explored. The OAs, Kappas, and AAs for different combinations of T and are given in Table 3. On IP and SA, the model performs best when and , i.e., 2 ghost operations are used using filters. Unlike IP and SA, the performance on UP increases when is increased. When is increased, the number of parameters increases. Since IP and SA have more classes (16) and fewer training samples per class (on an average), the tendency of overfitting increases with increasing . Thus, performance on the test set decreases. Fixing the value of T and to 2 and 3 respectively, GhoMR-Net is compared with ten state-of-the-art techniques, using and training samples. The spatial window dimensions of the input are kept the same as the prior experiments. For IP, the method outperforms FuSENet, SSRN, and HybridSN with an increase in OA by , , and respectively, on training data. Improvements or comparable results are obtained on SA and UP as well, which is reported in Table 4. In spite of having very few parameters, the satisfactory classification results of GhoMR-Net can be explained by the multi-receptive feature extraction strategy of GhoMR modules.

Table 3

OAs, Kappas and AAs obtained for different values of T (no. of Ghost transformations) and (Ghost filter size) on IP, UP and SA datasets respectively (for training data).

T	KT	IP			UP			SA
T	KT	OA	Kappa	AA	OA	Kappa	AA	OA	Kappa	AA
	3	98.64±0.2	98.45±0.3	98.00±0.8	99.75±0.0	99.67±0.0	99.33±0.1	99.98±0.0	99.98±0.0	99.98±0.0
2	5	98.51±0.2	98.30±0.2	98.26±0.2	99.77±0.0	99.70±0.0	99.42±0.1	99.97±0.0	99.97±0.0	99.96±0.0
	7	98.50±0.2	98.29±0.2	98.17±0.5	99.78±0.0	99.71±0.0	99.40±0.1	99.96±0.0	99.96±0.0	99.95±0.0
	3	98.19±0.3	97.94±0.3	97.67±0.9	99.72±0.1	99.64±0.1	99.26±0.1	99.98±0.0	99.97±0.0	99.97±0.0
4	5	98.12±0.4	97.86±0.5	96.80±0.8	99.80±0.0	99.74±0.0	99.47±0.1	99.97±0.0	99.97±0.0	99.97±0.0
	7	98.17±0.1	97.91±0.1	97.32±0.7	99.83±0.0	99.77±0.0	99.56±0.1	99.96±0.0	99.96±0.0	99.96±0.0

Table 4

OAs, Kappas, and AAs using the proposed GhoMR-Net and other state-of-the-art methods on 10% and 20% training samples.

Training	Methods	IP			UP			SA
Training	Methods	OA	Kappa	AA	OA	Kappa	AA	OA	Kappa	AA
10%	SVM	81.67±0.6	78.76±0.8	79.84±3.4	90.58±0.5	87.21±0.7	92.99±0.4	94.46±0.1	93.13±0.3	93.01±0.6
	2D-CNN	80.27±1.2	78.26±2.1	68.32±4.1	96.63±0.2	95.53±1.0	94.84±1.4	96.34±0.3	95.93±0.9	94.36±0.5
	3D-CNN	82.62±0.1	79.25±0.3	76.51±0.1	96.34±0.2	94.90±1.2	97.03±0.6	85.00±0.1	83.20±0.7	89.63±0.2
	M3D-CNN	81.39±2.6	81.20±2.0	75.22±0.7	95.95±0.6	93.40±0.4	97.52±1.0	94.20±0.8	93.61±0.3	96.66±0.5
	Two-CNN	96.71±0.1	96.10±0.1	96.16±0.1	97.71±0.1	97.62±0.1	97.45±0.2	97.12±0.3	96.98±0.2	97.00±0.2
	SENet (GMP)	97.48±0.3	97.84±0.2	97.91±0.3	97.56±0.5	97.41±0.4	97.47±0.4	98.88±0.1	98.93±0.2	99.01±0.1
	SENet (GAP)	97.62±0.3	97.91±0.2	97.88±0.3	97.53±0.6	97.48±0.5	97.52±0.5	99.11±0.2	98.89±0.2	99.06±0.2
	FuSENet	98.11±0.2	98.25±0.2	98.32±0.2	97.65±0.3	97.69±0.3	97.68±0.4	99.23±0.1	98.97±0.2	99.16±0.1
	SSRN	98.45±0.2	98.23±0.3	86.19±1.3	99.62±0.0	99.50±0.0	99.49±0.0	99.64±0.0	99.60±0.0	99.76±0.0
	HybridSN	98.39±0.4	98.16±0.5	98.01±0.5	99.72±0.1	99.64±0.2	99.20±0.2	99.98±0.0	99.98±0.0	99.98±0.0
	GhoMR-Net	98.64±0.2	98.45±0.3	98.00±0.8	99.75±0.0	99.67±0.0	99.33±0.1	99.98±0.0	99.98±0.0	99.98±0.0
20%	SVM	86.24±0.4	84.27±0.5	83.15±1.1	95.20±0.1	93.63±0.2	93.60±0.1	94.15±0.1	93.48±0.1	97.23±0.1
	2D-CNN	86.90±1.3	85.01±1.6	82.70±1.0	96.02±0.4	96.04±0.3	95.10±0.1	96.15±0.6	95.71±0.7	98.27±0.2
	3D-CNN	89.23±0.2	87.70±0.3	87.87±0.1	97.30±0.3	96.22±0.1	97.02±0.1	94.54±0.5	93.81±0.3	96.79±0.6
	M3D-CNN	93.67±0.1	92.70±0.3	93.60±0.6	97.41±0.2	96.05±0.6	98.22±0.1	94.92±0.3	94.40±0.1	97.28±0.2
	Two-CNN	98.73±0.2	98.71±0.2	98.73±0.2	98.72±0.3	98.40±0.2	98.45±0.2	98.13±0.4	98.01±0.2	98.10±0.2
	SENet (GMP)	98.53±0.6	98.27±0.8	97.91±1.5	99.05±0.2	98.81±0.2	98.86±0.2	99.07±0.3	99.19±0.2	99.13±0.2
	SENet (GAP)	98.76±0.5	98.43±0.7	98.20±1.0	99.36±0.1	99.20±0.1	99.30±0.1	99.50±0.1	99.55±0.1	99.40±0.1
	FuSENet	99.01±0.1	98.60±0.1	98.64±0.1	99.42±0.2	99.21±0.3	99.33±0.2	99.68±0.2	99.74±0.1	99.69±0.1
	SSRN	99.23±0.1	99.12±0.1	92.52±0.1	99.77±0.1	99.69±0.2	99.71±0.1	99.88±0.0	99.87±0.0	99.84±0.0
	HybridSN	99.47±0.1	99.40±0.1	99.38±0.1	99.86±0.1	99.82±0.0	99.71±0.1	100±0.0	100±0.0	100±0.0
	GhoMR-Net	99.54±0.0	99.47±0.0	99.30±0.2	99.90±0.0	99.86±0.0	99.82±0.0	99.99±0.0	99.99±0.0	99.99±0.0

In the next experiment, the robustness of the approach and the influence of input spatial dimensions are explored. This is performed on lesser training samples, i.e., and , using inputs of spatial size and . The OAs, AAs, and Kappas given in Table 5 show that performance deteriorates for all three datasets, which is expected. The classification maps for IP given in Figure 7 further verify it. It is observed, on increasing spatial size, the performance for IP and SA improves, since more spatial context is captured. However, in UP, as shown in Figure 5, the patches are short and discontinuous, unlike IP and SA. Thus, increasing spatial dimensions capture more noise, which reduces the classification accuracies.

Table 5

OAs, Kappas and AAs with lesser training samples (in %) and smaller spatial size of input data on IP, UP and SA datasets respectively.

Training Samples	Spatial Size	IP			UP			SA
Training Samples	Spatial Size	OA	Kappa	AA	OA	Kappa	AA	OA	Kappa	AA
5%	13 × 13	95.42±0.9	94.77±1.0	84.68±5.1	99.58±0.1	99.44±0.1	99.18±0.1	99.77±0.1	99.74±0.1	99.81±0.1
5%	11 × 11	94.23±0.1	93.42±0.1	84.72±2.1	99.61±0.0	99.49±0.1	99.28±0.1	99.62±0.1	99.58±0.1	99.73±0.0
3%	13 × 13	89.48±1.7	87.96±2.0	73.48±2.4	99.34±0.1	99.13±0.1	98.76±0.2	99.85±0.0	99.83±0.0	99.85±0.1
3%	11 × 11	87.95±1.2	86.23±1.4	72.75±3.6	99.41±0.1	99.22±0.1	99.00±0.1	99.57±0.2	99.52±0.2	99.71±0.1

Figure 7

Predicted classification maps for IP with and input spatial size for (a,b) training data and (c,d) training data, respectively.

Finally, a set of visualizations are performed to demonstrate the discriminative power of GhoMR-Net. The higher-dimensional features from the GAP layer of the network are extracted for each sample in the test set and are reduced to two-dimensional coordinates via t-SNE. These coordinates are plotted and shown in Figure 8 for the three datasets. It is clearly observed, that the features representing pixels having the same ground-truths form nearby clusters, which are represented by similar colors. Moreover, the confusion matrices are obtained on test data and are given in Figure 9. Furthermore, the total number of trainable parameters is compared with seven above-mentioned architectures-3D-CNN [52], M3D-CNN [56], Two-CNN [55], HybridSN [59], SENet [63], FuSENet [63], and SSRN [58]. As shown in Figure 10, the proposed network has only 32,704 trainable parameters, which is much lesser than HybridSN, SSRN, and FuSENet having 5,122,176, 500,384, and 128,848 parameters, respectively.

Figure 8

Visualization of extracted features via t-SNE where the 2D coordinates denotes the samples and the different colors represent different classes for the (a) IP, (b) UP, and (c) SA dataset.

Figure 9

Confusion matrices obtained on test samples for the (a) IP, (b) UP, and (c) SA dataset.

Figure 10

Number of trainable parameters in the proposed GhoMR-Net and other state-of-the art architectures.

4. Conclusions

In this study, a lightweight multi-receptive module called GhoMR is proposed for hyperspectral image (HSI) classification. It contains several internally connected receptive fields (RFs) to extract complex features from HSIs in a hierarchical approach. Unlike other approaches using convolutional layers, recently invented Ghost modules are used as RFs, which extracts hand-full features from the input and derives the remaining from existing ones. Using GhoMR blocks, a simple lightweight architecture called GhoMR-Net is designed to perform experiments on three standard datasets. The classification results are measured using three metrics and compared with other state-of-the-art techniques. Experiments with lesser training data and smaller input spatial sizes are also performed along with several visualizations and plots to understand the discriminative potential of the architecture better.

9 in total

1. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.

Authors: Vijay Badrinarayanan; Alex Kendall; Roberto Cipolla
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2017-01-02 Impact factor: 6.226

2. An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing.

Authors: Danfeng Hong; Naoto Yokoya; Jocelyn Chanussot; Xiao Xiang Zhu
Journal: IEEE Trans Image Process Date: 2018-11-09 Impact factor: 10.856

Review 3. Hyperspectral imaging for non-contact analysis of forensic traces.

Authors: G J Edelman; E Gaston; T G van Leeuwen; P J Cullen; M C G Aalders
Journal: Forensic Sci Int Date: 2012-10-23 Impact factor: 2.395

4. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Authors: Shaoqing Ren; Kaiming He; Ross Girshick; Jian Sun
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2016-06-06 Impact factor: 6.226

5. Joint and Progressive Subspace Analysis (JPSA) With Spatial-Spectral Manifold Alignment for Semisupervised Hyperspectral Dimensionality Reduction.

Authors: Danfeng Hong; Naoto Yokoya; Jocelyn Chanussot; Jian Xu; Xiao Xiang Zhu
Journal: IEEE Trans Cybern Date: 2021-06-23 Impact factor: 11.448

6. Res2Net: A New Multi-scale Backbone Architecture.

Authors: Shanghua Gao; Ming-Ming Cheng; Kai Zhao; Xin-Yu Zhang; Ming-Hsuan Yang; Philip H S Torr
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2019-08-30 Impact factor: 6.226

Review 7. Recent developments in hyperspectral imaging for assessment of food quality and safety.

Authors: Hui Huang; Li Liu; Michael O Ngadi
Journal: Sensors (Basel) Date: 2014-04-22 Impact factor: 3.576

8. Learning to propagate labels on graphs: An iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction.

Authors: Danfeng Hong; Naoto Yokoya; Jocelyn Chanussot; Jian Xu; Xiao Xiang Zhu
Journal: ISPRS J Photogramm Remote Sens Date: 2019-12 Impact factor: 8.979

9 in total