| Literature DB >> 35528334 |
Debaleena Datta1, Pradeep Kumar Mallick1, Akash Kumar Bhoi2,3,4, Muhammad Fazal Ijaz5, Jana Shafi6, Jaeyoung Choi7.
Abstract
Recent imaging science and technology discoveries have considered hyperspectral imagery and remote sensing. The current intelligent technologies, such as support vector machines, sparse representations, active learning, extreme learning machines, transfer learning, and deep learning, are typically based on the learning of the machines. These techniques enrich the processing of such three-dimensional, multiple bands, and high-resolution images with their precision and fidelity. This article presents an extensive survey depicting machine-dependent technologies' contributions and deep learning on landcover classification based on hyperspectral images. The objective of this study is three-fold. First, after reading a large pool of Web of Science (WoS), Scopus, SCI, and SCIE-indexed and SCIE-related articles, we provide a novel approach for review work that is entirely systematic and aids in the inspiration of finding research gaps and developing embedded questions. Second, we emphasize contemporary advances in machine learning (ML) methods for identifying hyperspectral images, with a brief, organized overview and a thorough assessment of the literature involved. Finally, we draw the conclusions to assist researchers in expanding their understanding of the relationship between machine learning and hyperspectral images for future research.Entities:
Mesh:
Year: 2022 PMID: 35528334 PMCID: PMC9071975 DOI: 10.1155/2022/3854635
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The statistical pie-charts of screened articles on ML/DL techniques used for HSI classification (source: SCI, SCIE, Scopus, WoS).
Figure 2The statistical bar graph of screened articles on ML/DL techniques used for HSI classification from 2015 to 2021 (source: SCI, SCIE, Scopus, WoS): (a). ML. (b). DL.
Figure 3The categories of the eminent machine learning techniques used for HSI classification.
Figure 4Classification strategy by multiclass SVM.
Summary of review of HSI classification using SVM.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2011 | Multiclass SVM [ | San Diego3—98.86% | Outperforms traditional SVM and deals better with Hugh's effect |
| 2012 | Fuzzy decision tree-support vector machine (FDT-SVM) [ | Washington DC mall—94.35% | Efficient testing accuracy truncated computational and storage demand, understandable edifices, and reduction of Hugh's effect |
| 2014 | Semi-supervised SVM kernel-spectral fuzzy C-means (KSFCM) [ | IP—98.52% | Enhanced classification and clustering by fully exploring both labeled and unlabeled samples |
| 2014 | SVM-radial basis function (SVM-RBF) [ | IP—88.7%, UP—94.7% | Outperforms other existing kernel-based methods |
| 2015 | Regional kernel-based SVM (RKSVM) [ | UP—95.40%, IP—92.55% | Outperforms pixel-point-based SVM-CK |
| 2017 | Multiscale segmentation of super-pixels (MSP-SVMsub) [ | UP: MSP-SVMsub—97.57%, IP: MSP-SVMsub-95.28% | Solving classic OBIC-based methods with difficulties determining the appropriate segmentation size reduces the Hughes phenomenon |
| 2018 | Extended morphological profiles (EMP), differential morphological profiles (DMP), Gabor filtering with SVM [ | UP: MFSVM-GF—98.46%, IP: MFSVM-GF—98.01% | Outruns several advanced classifiers: SVM, super-pixel-based SVM, SVM-CK, multifeature SVM, EPF |
| 2019 | SVM-PCA [ | IP—91.37%, UP—98.46% | Outperforms Naïve Bayes, decision tree k-NN |
Summary of review of HSI classification using sparse representation.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2013 | Kernel sparse representation classification (KSRC) [ | IP—96.8%, UP—98.34%, KSC—98.95% | Lacks in devising automatic window size collection of spatial image quality, and filtering degree of class spatial relations |
|
| |||
| 2014 | Multiscale adaptive sparse representation (MASR) [ | UP—98.47%, IP—98.43%, SV—97.33% | MASR outperformed the JSRM single-scale approach and several other classifiers on classification maps and accuracy |
| The structural dictionary desired to be more inclusive and trained by discriminative learning algorithms | |||
|
| |||
| 2015 | Sparse multinomial logistic regression (SMLR) [ | IP—97.71%, UP—98.69% | Being a pixelwise supervised method, its performance is better than other contemporary methods |
| The model can be improved via more technical validations, exploitation of MRF, and structured sparsity-inducing norm that enhances the interpretability, stability, and identity of the model learned | |||
|
| |||
| 2015 | Super-pixel-based discriminative sparse model (SBDSM) [377] | IP—97.12%, SV—99.37%, UP—97.33%, Washington DC mall—96.84% | The advantages of this model lie in harnessing spatial contexts effectively through the super-pixel concept, which is better in performance speed and classification accuracy |
| Determination of a supplementary and systematic way to adjust the count of super-pixels to various conditions and apply SR to other remote sensing practices | |||
|
| |||
| 2015 | Shape-adaptive joint sparse representation classification (SAJSRC) [ | IP—98.45%, UP—98.16%, SV—98.53% | Local area shape-adapted for every test pixel rather than a fixed square window for adaptive exploration of spatial PCs, making the method outperforms other corresponding methods |
| Region searching based on shape-adaption can be used instead of the reduced dimensional map to reconnoiter complete spatial information of the actual HSI | |||
|
| |||
| 2017 | Multiple-feature-based adaptive sparse representation (MFASR) [ | IP—97.99%, UP—98.39%, Washington DC mall—97.26% | SA regions' full utilization of all embedded joint features makes the method superior to some cutting-edge approaches |
| Enhancement of the proposed method in the future by selecting features automatically and improving dictionary learning to reduce the computational cost | |||
|
| |||
| 2018 | Weighted joint nearest neighbor and joint sparse representation (WJNN-JSR) [ | UP—97.42%, IP— 93.95%, SV—95.61%, Pavia center—99.27% | The model was improved using the Gaussian weighted method and incorporates the conventional test pixel area to achieve a new measure of classification knowledge: The Euclidean-weighted joint size |
| Creating more effective approaches to applying the system and further increasing classification accuracy are taken as future work | |||
|
| |||
| 2019 | Log-Euclidean kernel-based joint sparse representation (LogEKJSR) [ | IP—97.25%, UP—99.06%, SV—99.36% | Specializes in extracting covariance traits from a spatial square neighborhood to calculate the analogy of matrices with covariances employing the conventional Gaussian form of Kernel |
| Creation of adaptive local regions using super-pixel segmentation methods and learning the required kernel using multiple kernel learning methods | |||
|
| |||
| 2019 | Multiscale super-pixels and guided filter (MSS-GF) [ | IP—97.58%, UP—99.17% | Effective spatial and edge details in his, various regional scales to build MSSs to acquire accurate spatial information, and GF improved the classification maps for near-edge misclassifications |
| Additional applications of efficient methods to extract local features and segment super-pixels are added as future work | |||
|
| |||
| 2019 | Joint sparse representation—self-paced learning (JSR-SPL) [ | IP—96.60%, SV—98.98% | The findings are more precise and reliable than other JSR methods |
|
| |||
| 2019 | Maximum-likelihood estimation based JSR (MLEJSR) [ | IP—96.69%, SV—98.91%, KSC—97.13% | The model is reliable in terms of outliers |
|
| |||
| 2020 | Global spatial and local spectral similarity-based manifold learning-group sparse representation-based classifier (GSLS-ML-GSRC) [ | UP—93.42%, Washington DC mall—91.64%, SV—93.79% | The said fusion makes the method outperform other contemporary methods focused on nonlocal or local similarities |
|
| |||
| 2020 | Sparse-adaptive hypergraph discriminant analysis (SAHDA) [ | Washington DC mall—95.28% | Effectively depict the multiple complicated aspects of the HSI and will be considered for future spatial knowledge |
Figure 5Given the green nodes, the black node is independent of other nodes.
Summary of review of HSI classification using MRF.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2011 | Adaptive-MRF (a-MRF) [ | IP—92.55% | Handles homogeneous problem of “salt and pepper” areas and the possibility of overcorrection impact on class boundaries |
|
| |||
| 2014 | Hidden MRF and SVM (HMRF-SVM) [ | IP—90.50%, SV—97.24% | Outperforms SVM and improves overall accuracy outcomes by nearly 8% and 3.2%, respectively |
|
| |||
| 2014 | Probabilistic SR with MRF-based multiple linear logistic (PSR-MLL) [ | IP—97.8%, UP—99.1%, Pavia center—99.4% | Exceeds other modern contemporary methods in terms of accuracy |
|
| |||
| 2014 | MRF with Gaussian mixture model (GMM-MRF) [ | UP(LFDA-GMM-MRF)-90.88% UP(LPNMF-GMM-MRF)—94.96% | Advantageous for a vast range of operating conditions and spatial-spectral information to preserve multimodal statistics |
| GMM classificatory distributions are to be considered in the future | |||
|
| |||
| 2011 | MRF with sparse multinomial logistic regression classifier—spatially adaptive total variation regularization (MRF-SMLR-SpATV) [ | UP—90.01%, IP—97.85%, Pavia center—99.23% | Efficient time complexity of the model |
| Improvisation of the model by implementing GPU and learning dictionaries are the future agendas | |||
|
| |||
| 2016 | Multitask joint sparse representation (MJSR) and a stepwise Markov random filed framework (MSMRF) [ | IP—92.11%, UP—92.52% | The gradual optimization explores the spatial correlation, which significantly improves the effectivity and accuracy of the classification |
|
| |||
| 2016 | MRF with hierarchical statistical region merging (HSRM) [ | SVMMRF-HSRM: IP—93.10%, SV—99.15%, UP— 86.52%; MLRsubMRF-HSRM-IP—82.60%, SV—88.16%, UP—95.52% | Better solution to the technique of majority voting that suffers from the problem of scale choice |
| Considering the spatial features in the spatial prior model of objects of the different groups in the future | |||
|
| |||
| 2018 | Integration of optimum dictionary learning with extended hidden Markov random field (ODL-EMHRF) [ | ODL-EMHRF-ML-IP—98.56%, UP—99.63%; ODL-EMHRF-EM-IP—98.47%, UP—99.58% | The method has been proven to be better than SVM-associated EMRF |
|
| |||
| 2018 | Label-dependent spectral mixture model (LSMM) fused with MRF (LSMM-MRF) [ | The Konka image—94.19%, the shipping scene—66.45% | Efficient unsupervised classification strategy that considers spectral information in mixed pixels and the impact of spatial correlation |
| Enhanced theoretical derivations of EM steps | |||
|
| |||
| 2019 | Adaptive interclass-pair penalty and spectral similarity information (aICP2-SSI) along with MRF and SVM [ | UP—98.10%, SV—96.40%, IP— 96.14% | Outperforms other MRF-based methods |
| More efficient edge-preserving strategies, more spectral similitude, and class separable calculation methods as future research | |||
|
| |||
| 2019 | Cascaded version of MRF (CMRF) [ | IP—98.56%, Botswana—99.32%, KSC—99.24% | Backpropagation tunes the model parameters and least computation expenses |
|
| |||
| 2020 | Fusion of transfer learning and MRF (TL-MRF) [ | IP—93.89%, UP—91.79% | TL is taken to be very effective for HSI classification |
| Future research for reducing the number of calculations involved in the existing | |||
|
| |||
| 2020 | MRF with capsule net (caps-MRF) [ | IP—98.52%, SV—99.74%, Pavia center—99.84% | Ensures that relevant information is preserved, and the spatial constraint of the MRF helps achieve more precise model convergence |
| The combination of CapsNet with several postclassification techniques | |||
Summary of review of HSI classification using ELM.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2014 | Ensemble extreme learning machines (E2LM)-bagging-based ELMs (BagELMs) and AdaBoost-based (BoostELMs) [ | UP—94.3%, KSC—97.71%, SV—97.19% | BoostELM performs better than kernel and other EL methods |
| Performance of other differential or nondifferentiable activation functions | |||
|
| |||
| 2015 | Kernel-based ELM—composite kernel (KELM-CK) [ | IP—95.9%, UP—93.5%, SV—96.4% | Outperforms other SVM-CK-based models |
|
| |||
| 2015 | ELM's two-level fusions: feature-level fusion (FF-ELM) and mixing ELM classifier two levels of fusions: feature-level fusion (FF-ELM) [ | FF-ELM: UP—98.11%, IP—92.93%, SV—99.12%; DF-ELM—UP—99.25%, IP—93.58%, SV—99.63% | Outperforms basic ELM models |
|
| |||
| 2016 | Hierarchical local-receptive-field-based ELM (HL-ELM) [ | IP—98.36%, UP—98.59% | Surpasses other ELM methods in terms of accuracy and training speed |
|
| |||
| 2017 | Genetic-firefly algorithm with ELM (3FA-ELM) [ | HyDice DC mall—97.36%, HyMap—95.58% | Low complexity (ELM), better adaptability, and searching capability (FA) |
| Execution time needs to be reduced in future | |||
|
| |||
| 2017 | Local receptive fields-based kernel ELM (LRF-KELM) [ | IP—98.29% | Outperforms other ELM models |
|
| |||
| 2017 | Distributed KELM based on MapReduce framework with Gabor filtering (DK-Gabor-ELMM) [ | IP—92.8%, UP—98.8% | Outperforms other ELM models |
|
| |||
| 2017 | Loopy belief propagation with ELM (ELM-LBP) [ | IP—97.29% | Efficient time complexity |
|
| |||
| 2018 | Mean filtering with RBF-based KELM (MF-KELM) [ | IP—98.52% | The model offers the most negligible computational hazard |
|
| |||
| 2018 | Augmented sparse multinomial logistic ELM (ASMLELM) [ | IP—98.85%, UP—99.71%, SV—98.92% | Improved classification accuracy by extended multi-attribute profiles and more SR |
|
| |||
| 2018 | ELM with enhanced composite feature (ELM-ECF) [ | IP—98.8%, UP—99.7%, SV—99.5% | Low complexity and multiscale spatial feature for better accuracy |
| Incorporate feature-fusion technology | |||
|
| |||
| 2019 | Local block multilayer sparse ELM (LBMSELM) [ | IP—89.31%, UP—89.47%, SV—90.03% | Performs anomaly and target detection. Reduced computational overhead and increased classification accuracy by inverse free; saliency detection and gravitational search |
|
| |||
| 2019 | ELM-based heterogeneous domain adaptation (EHDA) [ | HU-DC —97.51%, UP-DC —96.63%, UP-HU —97.53% | Outperforms other HDA methods. Invariant feature selection |
|
| |||
| 2019 | Spectral-spatial domain-specific convolutional deep ELM (S2CDELM) [ | IP—97.42%, UP—99.72% | Easy construction with high training-testing speed |
| Merge of DL with ELM | |||
| 2020 | Cumulative variation weights and comprehensive evaluated ELM (CVW-CEELM) [ | IP—98.5%, UP—99.4% | Accuracy achieved due to the weight determination of multiple weak classifiers. Multiscale neighborhood choice and optimized feature selection |
Figure 6Principle of active learning.
Summary of review of HSI classification using active learning.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2008 | AL with expectation-maximization-binary hierarchical classifier (BHC-EM-AL) and maximum-likelihood (ML-EM-AL) [ | Range: KSC-90-96%, Botswana—94-98% | Better learning levels than the random choice of data points and an entropy-based AL |
| Measurement of the efficacy of the active learning-based knowledge transfer approach while systematically increasing the spatial/temporal segregation of the data sources | |||
|
| |||
| 2010 | Semi-supervised-segmentation with AL and multinomial logistic regression (MLR-AL) [ | IP—79.90%, SV—97.47% | Innovative mechanisms for selecting unlabeled training samples automatically, AL to enhance segmentation results |
| Testing the segmentation in various scenarios influenced by limited a priori accessibility of training images | |||
|
| |||
| 2013 | Maximizer of the posterior marginal by loopy belief propagation with AL (MPM-LBP-AL) [ | IP—94.76%, UP—85.78% | Improved accuracy than previous AL applications |
| Use parallel-computer-architectures such as commodity—clusters or GPUs to build computationally proficient implementation | |||
|
| |||
| 2015 | Hybrid AL-MRF, that is, uncertainly sampling breaking ties (MRF-AL-BT), passive selection approach random sampling (MRF-AL-RS), and the combination (MRF-AL-BT + RS) [ | IP—94.76%, UP—85.78% (MRF-AL-RS provides the highest accuracies) | Outperforms conventional AL and SVM AL methods due to MRF regularization and pixelwise output |
| Merge the model with other effective AL methods and test them with a limited number of training samples | |||
|
| |||
| 2015 | Integration of AL and Gaussian process classifier (GP-AL) [ | IP—89.49%, Pavia center—98.22% | Empirical autonomation of AL achieves reasonable accuracy |
| Adding diversity criterion to the heuristics and contextual information with the model and reducing computation time | |||
|
| |||
| 2016 | AL with hierarchical segmentation (HSeg) tree: adding features and adding samples (Adseg_AddFeat + AddSamp) [ | IP—82.77%, UP—92.23% | Outruns several baseline methods-selecting appropriate training data from already existing labeled datasets and potentially decreasing manual laboratory labeling |
| Reduce the computational time that limits its applicability on large-scale datasets | |||
|
| |||
| 2016 | Multiview 3D redundant discrete wavelet transform-based AL (3D-RDWT-MV-AL) [ | HU—99%, KSC—99.8%, UP—95%, IP—90% | The precious method as a combination of an initial process with AL, improved classification |
|
| |||
| 2017 | Discovering representativeness and discriminativeness by semi-supervised active learning (DRDbSSAL) [ | Botswana—97.03%, KSC—93.47%, UP—93.03%, IP—88.03% | Novel approach with efficient accuracy |
|
| |||
| 2017 | Multicriteria AL [ | KSC—99.71%, UP—99.66%, IP—99.44% | Surpasses other existing AL methods regarding stability, accuracy, robustness, and computational hazard |
| A multi-objective optimization strategy and the usage of advanced attribute-based profile features | |||
|
| |||
| 2018 | Feature-driven AL associated with morphological profiles and Gabor filter [ | IP—99.5%, UP—99.84%, KSC—99.53% (Gabor-BT) | A discriminative feature space is designed to gather helpful information into restricted samples |
|
| |||
| 2018 | Multiview intensity-based AL (MVAL)-multiview intensity-based query-representative strategy (MVIQ-R) [ | UP—98%, Botswana—99.5%, KSC—99.9%, IP—95% | Focus on pixel intensity obtains unique feature and hence better performance |
| Selection of combination of optimal attribute features | |||
|
| |||
| 2019 | Super-pixel with density peak augmentation (DPA)-based semi-supervised AL (SDP-SSAL) [ | IP—90.08%, UP—85.61% | Novel approach proposed based on super-pixels density metric |
| Development of a pixelwise solution to produce super-pixel-based neighborhoods | |||
|
| |||
| 2020 | Adaptive multiview ensemble spectral classifier and hierarchical segmentation (Ad-MVEnC_Spec + Hseg) [ | KSC—97.63%, IP—87.1%, HU—93.3% | Enhancement in the view sufficiency, and promotion of the disagreement level by the dynamic view, provides lower computational complexity due to parallel computing |
|
| |||
| 2020 | Spectral-spatial feature fusion using spatial coordinates-based AL (SSFFSC-AL) [ | IP—100%, UP—98.43% | High running speed can successfully address the “salt and pepper” phenomenon but drops a few if similar class samples are distributed in different regions differently |
| The sampling weight parameter conversion to an adaptive parameter is adjusted adaptively as the training samples are modified | |||
Figure 7The network structure of stacked autoencoders; input X-to-E is the encoding phase; E-to-output Y is the decoding phase.
Summary of the review of HSI classification using deep learning—AE.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2013 | Autoencoders (AE) [ | Error rate: KSC—4%, Pavia city—14.36% | This article opened a considerable doorway of research, including other deep models for better accuracy |
|
| |||
| 2014 | Stacked autoencoder and logistic regression (SAE-LR) [ | KSC—98.76%, Pavia city—98.52% | Highly accurate in comparison to RBF-SVM and performs testing in optimized time limit than SVM or KNN but fails in training time efficiency |
|
| |||
| 2016 | Spatial updated deep AE with collaborative representation-based classifier (SDAE-CR) [ | IP—99.22%, Pavia center—99.9%, Botswana—99.88% | Highly structured in extracting high specialty deep features and not the hand-crafted ones and accurate |
| Improving the deep network architecture and selection of parameters | |||
|
| |||
| 2019 | Compact and discriminative stacked autoencoder (CDSAE) [ | UP—97.59%, IP—95.81%, SV—96.07% | Efficient in dealing with feature space in low dimension, but the computation cost is high as per architecture size |
|
| |||
| 2021 | Stacked autoencoder with distance-based spatial-spectral vector [ | SV—97.93%, UP—99.34%, surrey—94.31% | Augmentation of EMAP features with the geometrically allocated spatial-spectral feature vectors achieves excellent results. Better tuning of hyperparameter and more powerful computational tool required |
| Improving the training model to become unified and classified in a more generalized and accurate way | |||
Figure 8The CNN architecture deploying the layers.
Comparison of convolutional layers.
| Arguments | Convolution layer | Pooling layer | Fully connected layer |
|---|---|---|---|
| Input | (i) 3D-cube, preceding set of feature maps | (i) 3D-cube, preceding set of feature maps | (i) Flattened-3d-cube, preceding set of feature maps |
|
| |||
| Parameters | (i) Kernel counts | (i) Stride | (i) Number of nodes |
| (ii) Kernel size | (ii) Size of window | (ii) Activation function: selected based on the role of the layer. For aggregating info-ReLU. For producing final classification—softmax | |
| (iii) Activation function (ReLU) | |||
| (iv) Stride | |||
| (v) Padding | |||
| (vi) Type and value of regularization | |||
|
| |||
| Action | (i) Application of filters made of small kernels to extricate features | (i) Reduction of dimensionality | (i) Aggregate information from final feature maps |
| (ii) Learning | (ii) Extraction of the maximum of a region average | (ii) Generate final classification | |
| (iii) One bias per filter | (iii) Sliding window framework | ||
| (iv) Application of activation function on each feature map value | |||
|
| |||
| Output | (i) 3D-cube, a 2D-map per filter | (i) 3D-cube, a 2D-map per filter, reduced spatial dimensions | (i) 3D-cube, a 2D-map per filter |
Summary of review of HSI classification using deep learning—CNN.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2015 | Convolutional neural network and multilayer perceptron (CNN-MLP) [ | Pavia city—99.91%, UP—99.62%, SV—99.53%, IP—98.88% | Far better than SVM, RBF mixed classifiers, the effective convergence rate can be useful for large datasets |
| Detection of human behavior from hyperspectral video sequences | |||
|
| |||
| 2016 | 3D-CNN [ | IP—98.53%, UP—99.66%, KSC—97.07% | A landmark in terms of quality and overall performance |
| Mapping performance to be accelerated by postclassification processing | |||
|
| |||
| 2016 | Spectral-spatial feature-based classification (SSFC) [ | Pavia center—99.87%, UP—96.98% | Highly accurate than other methods |
| Inclusion of optimal observation scale for improved outcome | |||
|
| |||
| 2016 | CNN-based simple linear iterative clustering (SLIC-CNN) [ | KSC—100%, UP—99.64, IP—97.24% | Deals with a limited dataset use spectral and local-spatial probabilities as an enhanced estimate in the Bayesian inference |
|
| |||
| 2017 | Pixel-pair feature enhanced deep CNN (CNN-PPF) [ | IP—94.34%, SV—94.8%, UP—96.48% | Overcomes the significant parameter and bulk-data problems of DL, PPFs make the system unique and reliable, and voting strategy makes the more enhanced evaluations in classification |
|
| |||
| 2017 | Multiscale 3D deep convolutional neural network (M3D-DCNN) [ | IP—97.61%, UP—98.49%, SV—97.24% | Outperforms popular methods like RBF-SVM and combinations of CNNs |
| Removing data limitations and improving the network architecture | |||
|
| |||
| 2018 | 2D-CNN, 3D-CNN, recurrent 2D-CNN (R-2D-CNN), and recurrent 3-D-CNN (R-3D-CNN) [ | IP-99.5%, UP—99.97%, Botswana—99.38%, PaviaC—96.79%, SV—99.8%, KSC—99.85% | R-3D-CNN outperforms all other CNNs mentioned and proves to be very potent in both fast convergence and feature extraction but suffers from the limited sample problem |
| Applying prior knowledge and transfer learning | |||
|
| |||
| 2019 | 3D lightweight convolutional neural network (CNN) (3D-LWNet) [ | UP—99.4%, IP—98.87%, KSC—98.22% | Provides irrelevance to the sources of data |
| Architecture is to be improvised by intelligent algorithms | |||
|
| |||
| 2020 | Hybrid spectral CNN (HybridSN) [ | IP—99.75%, UP—99.98%, SV—100% | Removes the shortfalls of passing over the essential spectral bands and complex, the tedious structure of 2D-CNN and 3D-CNN exclusively and outruns all other contemporary CNN methods superiorly, like SSRN and M-3D-CNN |
|
| |||
| 2020 | Heterogeneous TL based on CNN with attention mechanism (HT-CNN-attention) [ | SV—99%, UP—97.78%, KSC—99.56%, IP—96.99% | Efficient approach regardless of the sample selection strategies chosen |
|
| |||
| 2020 | Quantum genetic-optimized SR based CNN (QGASR-CNN) [ | UP—91.6%, IP—94.1% | With enhanced accuracy, overfitting and “salt-and-pepper” noise are resolved |
| Improvement of operational performance by the relation between feature mapping and selection of parameters | |||
|
| |||
| 2020 | Rotation-equivariant CNN2D (reCNN2D) [ | IP—97.78%, UP—98.89, SV—98.18% | Provides robustness and optimal generalization and accuracy without any data augmentation |
|
| |||
| 2020 | Spectral-spatial dense connectivity-attention 3D-CNN (SSDANet) [ | UP—99.97%, IP— 99.29% | Higher accuracy but high computational hazard |
| Optimization by using other efficient algorithms | |||
Figure 9The RNN structure with recurrent neurons.
Summary of review of HSI classification using deep learning—RNN.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2017 | Gated recurrent unit-based RNN with parametric rectified tanh as activation function (RNN-GRU-pretanh) [ | UP—88.85%, HU—89.85%, IP—88.63% | An enhanced model that utilizes the intrinsic feature provided by HS pixels with better accuracy than SVM |
| The study is limited to only spectral features | |||
| Incorporation of deep end-to-end convolutional RNN with both spatial-spectral features | |||
|
| |||
| 2019 | Spectral-spatial cascaded recurrent neural network (SSCasRNN) [ | IP—91.79%, UP—90.30% | Outruns pure RNN and CNN models due to the perfect placement of convolutional and recurrent layers to explore joint information |
|
| |||
| 2020 | Geometry-aware deep RNN (Geo-DRNN) [ | UP—98.05%, IP—97.77% | Due to encoding the complex geometrical structures, the data lack space |
| Minimization of memory-occupation | |||
|
| |||
| 2021 | 2D and 3D spatial attention-driven recurrent feedback convolutional neural network (SARFNN) [ | IP—99.15%, HU—86.05% | Integrating attention and feedback mechanism with recurrent nets in two layers, 2D and 3D, enables efficient accuracy |
Figure 10The detailed DBN structure.
Summary of review of HSI classification using deep learning—DBN.
| Year | Method used | Dataset and COA | Research remarks |
|---|---|---|---|
| 2015 | Deep belief network and logistic regression (DBN-LR) [ | IP—95.95%, Pavia City—99.05% | The drawback in training time complexity, it is super-fast testing, and result generating capability outperforms RBF-SVM with EMP |
|
| |||
| 2019 | Spectral-adaptive segmented deep belief network (SAS-DBN) [ | UP—93.15%, HU—98.35% | Capable of addressing the complexities and other subsidiaries of limited samples |
|
| |||
| 2020 | Conjugate gradient update-based DBN (CGDBN) [ | UP—97.31% | Better approach towards stability and convergence of the training model |
| High time complexity | |||
Figure 11The GAN architecture.
Summary of review of HSI classification using deep learning—GAN.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2018 | Hyperspectral 1D generative adversarial networks (HSGAN) [ | IP—83.53% | Outperforms CNN, KNN, etc. |
|
| |||
| 2018 | 3D augmented GAN [ | SV—93.67%, IP—91.1%, KSC—98.12% | Data augmentation solved the problem of overfitting and improved class accuracy |
|
| |||
| 2019 | Conditional GAN with conditional variational AE (CGAN-CVAE) [ | UP—83.85%, DC Mall—89.36% | Semi-supervised and ensemble prediction technique ensures the model's training under limited sample conditions |
|
| |||
| 2020 | Semi-supervised variational GAN (SSVGAN) [ | UP—84.35%, Pavia Center—97.15%, DC Mall—92.21%, Jiamusi—64.76% | Outperforms other GAN variants, that is, CVAEGAN and ACGAN, but it suffers from feature matching, overfitting, and convergence problem |
| Correction through metric learning method | |||
|
| |||
| 2020 | Spectral-spatial GAN-conditional random field (SS-GANCRF) [ | IP—96.3%, UP—99.31% | Enhanced classification capability |
| Creating an end-to-end training system, graph constraint placed on the convolutional layers | |||
|
| |||
| 2021 | Adaptive weighting feature-fusion generative adversarial network (AWF2-GAN) [ | IP—97.53%, UP—98.68% | Exploration of the entire joint feature space and fusion of them, joint loss function, and the central loss gained intraclass sensitivity from local neighboring areas and offered an efficient spatial regularization outcome |
|
| |||
| 2021 | Variational generative adversarial network with crossed spatial and spectral interactions (CSSVGAN) [ | IP—93.61%, UP—99.11%, SV—97% | Increased classification potential by utilizing transformer and GAN |
Figure 12The principle of transfer learning.
Summary of review of HSI classification using transfer learning.
| Year | Method used | Dataset and COA | Research remarks and future scope |
|---|---|---|---|
| 2018 | Deep mapping-based heterogeneous transfer learning model (DLTM) [ | Washington DC Mall—96.25% | Capable of binary classification |
| Improvisation to multiclass classification | |||
|
| |||
| 2018 | AL with stacked sparse autoencoder (AL-SSAE) [ | UP—99.48%, center of Pavia—99.8%, SV— 99.45% | Domains, both source, and target possess finely tuned hyperparameters |
| Architectural parameters need to be modified further to enhance the classification accuracy | |||
|
| |||
| 2020 | Heterogeneous TL based on CNN with attention mechanism (HT-CNN-attention) [ | SV—99%, UP—97.78%, KSC—99.56%, IP—96.99% | Efficient approach regardless of the sample selection strategies chosen |
|
| |||
| 2020 | ELM-based ensemble transfer learning (TL-ELM) [ | UP—98.12%, Pavia center—96.25% | Efficient accuracy and transferability with high training speed |
| Inclusion of SuperPCA and knowledge transfer | |||
|
| |||
| 2020 | Lightweight shuffled group convolutional neural network (abbreviated as SG-CNN) [ | Botswana—99.67%, HU—99.4%, Washington DC—97.06% | Fine-tuned model as compared to CNN architectures, low computational cost for training |
| Inclusion of more grouped convolutional architectures | |||
|
| |||
| 2021 | Super-pixel pooling convolutional neural network with transfer learning (SP-CNN) [ | SV—95.99%, UP—93.18%, IP—94.45% | More excellent parameter optimization with more accuracy using a limited number of samples and in a very short period for both training and testing |
| Optimal super-pixel segmentation and merging with different CNN architectures | |||
Comparison between ML and non-ML techniques for HSI classification.
| Methods | Advantages | Disadvantages |
|---|---|---|
| Classical state-of-art techniques | (i) Simple structure and design | (i) High space complexity due to the storage of bulk data |
| (ii) Less time consumption | (ii) Based on empirical identities, hence a tedious workpiece | |
| (iii) Easy to implement | (iii) Feature selection and extraction are not accurate | |
| (iv) Dimension handling skillfully by PCA and ICA | (iv) Suffers from limited labeled sample problem, Hughes phenomenon, and noise | |
| (v) Better binary and moderate multiclass classification by kernel and SVM | ||
|
| ||
| Advanced machine learning techniques | (i) Easy dealing with high-dimensional data, that is, troubles of Hughes phenomenon removed | (i) The construction of the model is difficult due to its complex network-alike structure |
| (ii) Equally manipulative to labeled and unlabeled samples | (ii) High time complexity due to training and testing of the huge amount of raw HSI data | |
| (iii) Precise and meticulous choice of features | (iii) Extremely expensive design | |
| (iv) High-end-precise models to deal with real hypercubes, hence, top-notch classification accuracy | (iv) Strenuous to implement | |
| (v) Removes overfitting, noises, and other hurdles to a much greater extent | ||
| (vi) Mimics the human brain to solve multiclass problems | ||
The advantages and challenges of the ML- and DL-based techniques for HSI classification.
| ML/DL techniques | Advantages | Challenges |
|---|---|---|
| Support vector machine | (i) Robust in terms of outliers, Hughes effect, and dimensions as its reduction is not primarily necessary [ | (i) It works very well for binary classification but fails for generating accurate classes for multiclass problems [ |
| (ii) Supports both supervised, semi-supervised, and unsupervised problems with less overfitting risks [ | (ii) Training time is high for high-class datasets like HSI [ | |
| (iii) Form of a sigmoid kernel that deals better than the rest of the previous for unlabeled and unstructured HSI datasets [ | (iii) Difficulty in fine-tuning the parameters [ | |
| (iv) The capability of solving the classification problem for both binary and multiclass problems by outperforming several methods [ | (iv) Complex interpretability [ | |
| (v) Can improve the performance if assisted with other supporting methods [ | (v) Lack of easy generalization to the datasets having multiple classes [ | |
| (vi) Complexity in building the model due to a lack of sufficient labeled samples [ | ||
|
| ||
| Sparse representation and classification | (i) A dictionary with relevant data is used for learning with a minimal number of optimal parameters [ | (i) Making the dictionary considers high expense overheads [ |
| (ii) Builds precise and powerful classification models with higher interpretability through sparse coding [ | (ii) The dictionary or the coding might cause loss of information [ | |
| (iii) Proper memory usage in an optimized manner [ | (iii) Difficulties in representing such high-profile with higher resolution image data like HSI through the sparse matrix [ | |
| (iv) Reduces the estimated variance between the classes to produce better outcomes [ | ||
|
| ||
| Markov random field | (i) Works well for a wide range of unstructured problems and no direct dependency between classes and the parameters [ | (i) Normalization of data might be hectic for high dimension data [ |
| (ii) Better denoising effect [ | (ii) Suffers from the lack of training undirected data that might not be possible to represent graphically [ | |
| (iii) Robust for both spatial and spectral distributions [ | (iii) Poor interpretability [ | |
| (iv) Time complexity is low due to the graphical representation of data [ | ||
|
| ||
| Extreme learning machines | (i) Less training time and faster learning rate as compared to previous methods [ | (i) Higher computational hazard [ |
| (ii) Avoidance for local minima and finishes job in single iteration [ | (ii) The wrong choice of an optimal amount of the hidden layer neurons may cause redundancy in the model and hence affect the classification accuracy [ | |
| (iii) Advantageous for overfitting caused due to several bands in HSIs [ | (iii) There is plenty of room for advancements in the algorithm to accommodate itself to be compatible for dealing with HSI data [ | |
| (iv) Builds an enhanced model with better prediction performance at the optimized expense [ | ||
| (v) Improved generalization ability, robustness, and controllability [ | ||
|
| ||
| Active learning | (i) A very efficient way of learning for both supervised and semi-supervised problems [ | (i) Higher computational hazard [ |
| (ii) Ease in segregating the interclass and intraclass features through active query sets [ | (ii) The wrong choice of an optimal amount of the hidden layer neurons may cause redundancy in the model and hence affect the classification accuracy [ | |
| (iii) Training speed is comparatively high for not so large-scale data [ | (iii) There is plenty of room for advancements in the algorithm to accommodate itself to be compatible for dealing with HSI data [ | |
| (iv) Knowledge-based solid models can be generated [ | ||
| (v) Achieves greater classification accuracies for unlabeled HSIs [ | ||
|
| ||
| Deep learning | (i) Diverse, unstructured, and unlabeled raw HSI datasets are finely processed where preprocessing of the data is not needed [ | (i) Suffers from a lack of a large amount of HSI data, which is practically unavailable [ |
| (ii) Possesses the capability to address supervised, semi-supervised, and specifically unsupervised learning problems [ | (ii) The extreme expense to generate an appropriate model by training a complex data structure like HSIs [ | |
| (iii) Expertise in dimension reduction, denoising, feature extraction as embedded properties [ | (iii) Low interpretability [ | |
| (iv) Address in an illustrious manner to the issues such as Hughes phenomenon, overfitting, and convergence. [ | (iv) Theoretically not sound, hence incomprehensible where an error occurs and its rectification [ | |
| (v) Robust and adaptive to new features introduced in the dataset [ | (v) High time and space complexity and computational hazard [ | |
| (vi) The hidden layer neurons are proven to be eminent in training the desired model with a highly qualified prior knowledge (DBN, RNN, CNN) [ | ||
| (vii) Computational efficiency with high-performance speed (CNN, SAE) [ | ||
| (viii) Data augmentation facility (GAN) [ | ||
|
| ||
| Transfer learning | (i) Works as a combination of different models, be it traditional or latest machine-lefted techniques, that together brings out a highly improved hybrid model [ | (i) Data overfitting [ |
| (ii) Capable of transferring knowledge from the source domain, that is , a pretrained model to the target domain, that is, a new model to make it more enriched [ | (ii) Complex structure of the model [ | |
| (iii) Greater feature extraction and selection capability [ | (iii) Less interpretability | |
| (iv) Stable model with highly optimized parameters and hyperparameters [ | (iv) Difficulty in implementation | |
| (v) High training speed and accuracy with low computational cost [ | ||
| (vi) Reduced computational cost and training time complexity [ | ||