| Literature DB >> 36081683 |
Katja Berger1, Juan Pablo Rivera Caicedo2, Luca Martino3, Matthias Wocher1, Tobias Hank1, Jochem Verrelst4.
Abstract
The current exponential increase of spatiotemporally explicit data streams from satellitebased Earth observation missions offers promising opportunities for global vegetation monitoring. Intelligent sampling through active learning (AL) heuristics provides a pathway for fast inference of essential vegetation variables by means of hybrid retrieval approaches, i.e., machine learning regression algorithms trained by radiative transfer model (RTM) simulations. In this study we summarize AL theory and perform a brief systematic literature survey about AL heuristics used in the context of Earth observation regression problems over terrestrial targets. Across all relevant studies it appeared that: (i) retrieval accuracy of AL-optimized training data sets outperformed models trained over large randomly sampled data sets, and (ii) Euclidean distance-based (EBD) diversity method tends to be the most efficient AL technique in terms of accuracy and computational demand. Additionally, a case study is presented based on experimental data employing both uncertainty and diversity AL criteria. Hereby, a a simulated training data base by the PROSAIL-PRO canopy RTM is used to demonstrate the benefit of AL techniques for the estimation of total leaf carotenoid content (Cxc ) and leaf water content (Cw ). Gaussian process regression (GPR) was incorporated to minimize and optimize the training data set with AL. Training the GPR algorithm on optimally AL-based sampled data sets led to improved variable retrievals compared to training on full data pools, which is further demonstrated on a mapping example. From these findings we can recommend the use of AL-based sub-sampling procedures to select the most informative samples out of large training data pools. This will not only optimize regression accuracy due to exclusion of redundant information, but also speed up processing time and reduce final model size of kernel-based machine learning regression algorithms, such as GPR. With this study we want to encourage further testing and implementation of AL sampling methods for hybrid retrieval workflows. AL can contribute to the solution of regression problems within the framework of operational vegetation monitoring using satellite imaging spectroscopy data, and may strongly facilitate data processing for cloud-computing platforms.Entities:
Keywords: EnMAP; Gaussian process regression; hyperspectral; optimal experimental design; query strategies
Year: 2021 PMID: 36081683 PMCID: PMC7613397 DOI: 10.3390/rs13020287
Source DB: PubMed Journal: Remote Sens (Basel) ISSN: 2072-4292 Impact factor: 5.349
Studies using AL strategies for regression problems in the context of terrestrial Earth observation data analysis: remote sensors, estimated vegetation traits (abbreviations in Section 3.2), applied machine learning regression algorithms (ML algorithm, abbreviations in Sections 3.2 and 3.3) and active learning strategies (AL method, best performing in bold, abbreviations in Section 2.3, Appendixes A.1 and A.2).
| References | Sensors | Estimated Traits | ML Algorithms | AL Methods |
|---|---|---|---|---|
| Verrelst et al. [ | Sentinel-3 OLCI | LAI, | KRR, GPR | |
| Upreti et al. [ | Sentinel-2 | LAI, | GPR | |
| Verrelst et al. [ | EnMAP (resampled) |
| KRR, VHGPR | |
| Upreti et al. [ | VEN | Fcover | GPR | EBD |
| Pipia et al. [ | Sentinel-2 | green LAI | GPR | PAL, EQB, RSAL, |
Figure 1Hybrid retrieval workflow employing PROSAIL-PRO (adapted from [19]). The RTM was used to create the simulated training database, which represents the “unlabeled” data pool. Sample selection is performed with AL heuristics by means of GPR algorithms to establish a specific retrieval model for functional vegetation traits. Output maps provide estimates along with corresponding uncertainty; exemplary maps from Estévez et al. [25].
Figure 2RMSE for retrieval of C (a) and C (b) applying six different AL methods and RS on a PROSAIL-PRO simulated training database with GPR.
Figure 3Mapping leaf water content (C) using GPR trained over a full training database (left) and using EBD-optimized sampling (right): C estimates in cm (a,d), absolute uncertainty in form of standard deviation cm (SD, b,e) and relative uncertainty in form of the coefficient of variation % (CV, c,f).