| Literature DB >> 36172268 |
Masoumeh Aghababaei1, Ataollah Ebrahimi1, Ali Asghar Naghipour1, Esmaeil Asadi1, Adrián Pérez-Suay2, Miguel Morata2, Jose Luis Garcia2, Juan Pablo Rivera Caicedo3, Jochem Verrelst2.
Abstract
Accurate plant-type (PT) detection forms an important basis for sustainable land management maintaining biodiversity and ecosystem services. In this sense, Sentinel-2 satellite images of the Copernicus program offer spatial, spectral, temporal, and radiometric characteristics with great potential for mapping and monitoring PTs. In addition, the selection of a best-performing algorithm needs to be considered for obtaining PT classification as accurate as possible. To date, no freely downloadable toolbox exists that brings the diversity of the latest supervised machine-learning classification algorithms (MLCAs) together into a single intuitive user-friendly graphical user interface (GUI). To fill this gap and to facilitate and automate the usage of MLCAs, here we present a novel GUI software package that allows systematically training, validating, and applying pixel-based MLCA models to remote sensing imagery. The so-called MLCA toolbox has been integrated within ARTMO's software framework developed in Matlab which implements most of the state-of-the-art methods in the machine learning community. To demonstrate its utility, we chose a heterogeneous case study scene, a landscape in Southwest Iran to map PTs. In this area, four main PTs were identified, consisting of shrub land, grass land, semi-shrub land, and shrub land-grass land vegetation. Having developed 21 MLCAs using the same training and validation, datasets led to varying accuracy results. Gaussian process classifier (GPC) was validated as the top-performing classifier, with an overall accuracy (OA) of 90%. GPC follows a Laplace approximation to the Gaussian likelihood under the supervised classification framework, emerging as a very competitive alternative to common MLCAs. Random forests resulted in the second-best performance with an OA of 86%. Two other types of ensemble-learning algorithms, i.e., tree-ensemble learning (bagging) and decision tree (with error-correcting output codes), yielded an OA of 83% and 82%, respectively. Following, thirteen classifiers reported OA between 70% and 80%, and the remaining four classifiers reported an OA below 70%. We conclude that GPC substantially outperformed all classifiers, and thus, provides enormous potential for the classification of a diversity of land-cover types. In addition, its probabilistic formulation provides valuable band ranking information, as well as associated predictive variance at a pixel level. Nevertheless, as these are supervised (data-driven) classifiers, performances depend on the entered training data, meaning that an assessment of all MLCAs is crucial for any application. Our analysis demonstrated the efficacy of ARTMO's MLCA toolbox for an automated evaluation of the classifiers and subsequent thematic mapping.Entities:
Keywords: Automated Radiative Transfer Models Operator; Gaussian process classifier; Sentinel-2; machine-learning classification toolbox; plant types
Year: 2022 PMID: 36172268 PMCID: PMC7613646 DOI: 10.3390/rs14184452
Source DB: PubMed Journal: Remote Sens (Basel) ISSN: 2072-4292 Impact factor: 5.349
Figure 1Schematic overview of ARTMO’s v3.29 modules (RTMs, toolboxes, tools).
Implemented parametric classifiers into MLCA toolbox.
| Classifier | Description | Ref. |
|---|---|---|
| Discriminant Analysis (DA) | DA is a linear model for classification and dimensionality reduction, most commonly used for feature extraction in pattern classification problems. First, in 1936, Fisher formulated linear discriminant for two classes, and in 1948, C.R Rao generalized it for multiple classes. LDA projects data from a D dimensional feature space down to a D′ (D > D′) dimensional space in a way to maximize the variability between the classes and reduce the variability within the classes. The quadratic DA is also known as maximum likelihood classification within popular remote sensing software packages. | [ |
| Naive Bayes (NB) | The NB is a classification algorithm based on the concept of the Bayes theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. | [ |
Implemented non-parametric (machine learning) classifiers into MLCA toolbox.
| Classifier | Description | Ref. |
|---|---|---|
| Nearest neighbor (NN) | The principle behind NN methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these. The basic NN classification uses uniform weights; that is, the value assigned to a query point is computed from a simple majority vote of the nearest neighbors. | [ |
| Decision trees (DT) | Classification trees (CF) fit binary decision tree for multiclass classification. See also: | [ |
| Neural networks (NN) | ANNs in their basic form are essentially fully connected layered structures of artificial neurons (AN). An AN is basically a pointwise nonlinear function (e.g., a sigmoid or Gaussian function) applied to the output of a linear regression. ANs with different neural layers are interconnected with weighted links. The most common ANN structure is a feed-forward ANN, where information flows in a unidirectional forward mode. From the input nodes, data pass hidden nodes (if any) toward the output nodes. The following algorithms have been implemented: | [ |
| Ensemble learners (EL) | EL combines a set of trained weak learner models and data on which these learners were trained. EL can predict ensemble response for new data by aggregating predictions from its weak learners. The following EL are provided: (1) discriminant EL, (2) k-nearest neighbor (KNN) EL, (3) tree EL (bagging), (4) tree EL (AdaBoost), (5) tree EL (RUSBoost). Bagging and boosting techniques are typically applied to decision trees. Bag generally constructs deep trees. This construction is both time-consuming and memory-intensive. This also leads to relatively slow predictions. Boost algorithms generally use very shallow trees. This construction uses relatively little time or memory. However, for effective predictions, boosted trees might need more ensemble members than bagged trees. See also: | [ |
| error-correcting output codes (ECOC) | The ECOC method is a technique that allows a multi-class classification problem to be reframed as multiple binary classification problems, allowing the use of native binary classification models to be used directly. Unlike one-vs-rest and one-vs-one methods that offer a similar solution by dividing a multi-class classification problem into a fixed number of binary classification problems, the error-correcting output codes technique allows each class to be encoded as an arbitrary number of binary classification problems. When an overdetermined representation is used, it allows the extra models to act as “error-correction” predictions that can result in better predictive performance. The following ECOC are provided: (1) discriminant analysis, (2) kernel classification, (3) KNN, (4) linear classification, (5) naive Bayes classification, (6) decision tree, (7) support vector machine. See also | [ |
| Gaussian process (GP) | The GP is a stochastic process where each random variable follows a multivariate normal distribution. The goal is to learn mapping from the input data to their corresponding classification label, which can then be used on new, unseen data pixels. When the GP is developed with kernel methods, it allows mapping the original data into a possibly infinite dimensional space in which the input–output relation can be better estimated as it considers more complex and flexible functions than the linear models. As the GP is based on a probabilistic framework, it allows to provide uncertainty estimation per sample. This measurement becomes useful for taking decisions and allows to be more or less confident with the inferred classification label. Moreover, the GP can use more sophisticated kernel functions than the standard linear kernel or the radial basis function (RBF) kernel | [ |
Figure 2Schematic overview of ARTMO’s MLCA toolbox. The toolbox is on top, and the main GUIs are underneath.
Figure 3The LabelMeClass tool for extracting labeled data from imagery.
Figure 4Location of Marjan in the Chaharmahal-Va-Bakhtiari province in southwest Iran: (a) Iran border; (b) Chaharmahal-Va-Bakhtiari border; (c) study area border (Marjan).
Accuracy results against validation data for all MLCAs. Results are ordered from best overall accuracy (OA) to worst.
| MLCA | PT1 | PT2 | PT3 | PT4 | |
|---|---|---|---|---|---|
| Gaussian processes classifier | |||||
| OA = 90.0% | |||||
| Random forest | |||||
| OA = 86.5% | |||||
| Tree EL (bag) | |||||
| OA = 83.1% | |||||
| Decision tree (ECOC) | |||||
| OA = 82.0% | |||||
| Discriminant analysis (ECOC) | |||||
| OA =79.7% | |||||
| Neural network (Adam) | |||||
| OA = 79.0% | |||||
| Classification trees | |||||
| OA = 78.6% | |||||
| Discriminant analysis (quadratic) | |||||
| OA = 78.6% | |||||
| k-nearest neighbors (ECOC) | |||||
| OA =76.4% | |||||
| Neural network (trainbr) | |||||
| OA = 74.1% | |||||
| Support vector machines (ECOC) | |||||
| OA = 74.1% | |||||
| Linear classification (ECOC) | |||||
| OA = 74.0% | |||||
| Neural network (trainscg) | |||||
| OA = 73.0% | |||||
| Naive Bayes | |||||
| OA = 72.0% | |||||
| Neural network (trainlm) | |||||
| OA = 72.0% | |||||
| Tree EL (AdaBoost) | |||||
| OA = 70.7% | |||||
| Discriminant EL | |||||
| OA = 69.6% | |||||
Figure 5Left: confusion matrix of GPC against validation data with correct detection in the blue shade and wrong detection in the red shade. Furthermore, summary percentages per class are provided. Right: polar plot of GPC band relevance for the four classes calculated according to the equations described in [74]. The further away from center, the more important.
Figure 6Left: thematic map of PTs as obtained from the top-performing Gaussian process classifier (GPC). Right: Associated uncertainty map as expressed by standard deviation. The higher the value, the more uncertain.
Figure 7Thematic maps of PTs as obtained from the second- to seventh-best validated classifiers (see Table 3). RF: random forests, TEL: tree-ensemble learning (bag), DT: decision tree (ECOC), DA: discriminant analysis (ECOC), NN: neural network (Adam), CT: classification trees.