| Literature DB >> 35351917 |
Jan Borgelt1, Jorge Sicacha-Parada2, Olav Skarpaas3, Francesca Verones4.
Abstract
Besides being central for understanding both global biodiversity patterns and associated anthropogenic impacts, species range maps are currently only available for a small subset of global biodiversity. Here, we provide a set of assembled spatial data for terrestrial vascular plants listed at the global IUCN red list. The dataset consists of pre-defined native regions for 47,675 species, density of available native occurrence records for 30,906 species, and standardized, large-scale Maxent predictions for 27,208 species, highlighting environmentally suitable areas within species' native regions. The data was generated in an automated approach consisting of data scraping and filtering, variable selection, model calibration and model selection. Generated Maxent predictions were validated by comparing a subset to available expert-drawn range maps from IUCN (n = 4,257), as well as by qualitatively inspecting predictions for randomly selected species. We expect this data to serve as a substitute whenever expert-drawn species range maps are not available for conducting large-scale analyses on biodiversity patterns and associated anthropogenic impacts.Entities:
Mesh:
Year: 2022 PMID: 35351917 PMCID: PMC8964733 DOI: 10.1038/s41597-022-01233-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Schematic summary of the dataset. Top: Native region extents were retrieved from Kew’s Plants of the World online. Middle: Occurrence data was retrieved from the Global Biodiversity Information Facility (GBIF)[24] and filtered into three different occurrence data types: raw data (blue), presence cells (grey) and thinned data (yellow). Bottom: The different occurrence data types were used in Maxent models to predict relative environmental suitability indices within native regions (i.e. range estimates). Differences between Model 0 and Model 1 to 3. Model 0 was trained to support variable selection using raw data in k-fold cross validated Maxent models (one model for each combination of feature classes, i.e. linear (L), quadratic (Q), hinge (H), product (P) and threshold (T)). The selected variables and each of the three occurrence data types were used to train a set of separate k-fold cross validated Maxent models (one model for each possible combination of feature classes, regularization multipliers and occurrence data type). The overall best performing model was selected for each species based on performance metrics.
Fig. 2Data examples for randomly selected species and spatial coverage of the dataset. Best performing Maxent prediction, highlighting environmentally suitable conditions within the species native regions (i.e. modelling extent) along retrieved occurrence records (white points) for (a) Amomum pterocarpum, (b) Cedrus libani, (c) Laburnum anagyroides, (d) Megistostegium nodulosum. Performance of the shown predictions indicated by maximum F1-score and the area under the receiver operating characteristics curve for true vs. false positive rate (AUC) and recall vs. precision (AUCPR). Bottom: number of (e) retrieved native regions, (f ) retrieved occurrence records, and (g) generated Maxent predictions across the globe.
Environmental data used in this study. The layers (n = 36) are based on Karger et al.[62] and the European space agency’s land cover product[63].
| Variable | Code |
|---|---|
| Annual Mean Temperature | CHELSA_BIO1 |
| Mean Diurnal Range | CHELSA_BIO2 |
| Isothermality | CHELSA_BIO3 |
| Temperature Seasonality | CHELSA_BIO4 |
| Max Temperature of Warmest Month | CHELSA_BIO5 |
| Min Temperature of Coldest Month | CHELSA_BIO6 |
| Temperature Annual Range | CHELSA_BIO7 |
| Mean Temperature of Wettest Quarter | CHELSA_BIO8 |
| Mean Temperature of Driest Quarter | CHELSA_BIO9 |
| Mean Temperature of Warmest Quarter | CHELSA_BIO10 |
| Mean Temperature of Coldest Quarter | CHELSA_BIO11 |
| Annual Precipitation | CHELSA_BIO12 |
| Precipitation of Wettest Month | CHELSA_BIO13 |
| Precipitation of Driest Month | CHELSA_BIO14 |
| Precipitation Seasonality | CHELSA_BIO15 |
| Precipitation of Wettest Quarter | CHELSA_BIO16 |
| Precipitation of Driest Quarter | CHELSA_BIO17 |
| Precipitation of Warmest Quarter | CHELSA_BIO18 |
| Precipitation of Coldest Quarter | CHELSA_BIO19 |
| Fraction of mosaic cropland/natural vegetation | X30_ESA_CCI |
| Fraction of mosaic natural vegetation/cropland | X40_ESA_CCI |
| Fraction of broadleaved evergreen, closed to open, tree cover | X50_ESA_CCI |
| Fraction of broadleaved deciduous, closed to open, tree cover | X60_ESA_CCI |
| Fraction of needleleaved evergreen, closed to open, tree cover | X70_ESA_CCI |
| Fraction of needleleaved deciduous, closed to open, tree cover | X80_ESA_CCI |
| Fraction of mixed leaf type tree cover | X90_ESA_CCI |
| Fraction of mosaic tree and shrub/herbaceous cover | X100_ESA_CCI |
| Fraction of mosaic herbaceous cover/tree and shrub | X110_ESA_CCI |
| Fraction of shrubland | X120_ESA_CCI |
| Fraction of grassland | X130_ESA_CCI |
| Fraction of lichens and mosses | X140_ESA_CCI |
| Fraction of sparse vegetation | X150_ESA_CCI |
| Fraction of tree cover, flooded, fresh or brakish water | X160_ESA_CCI |
| Fraction of tree cover, flooded, saline water | X170_ESA_CCI |
| Fraction of shrub or herbaceous cover, flooded, fresh/saline/brakish water | X180_ESA_CCI |
| Fraction of bare areas | X200_ESA_CCI |
Performance of Maxent predictions in the suggested dataset. Mean and median values of area under the receiver operating characteristics curve for true vs. false positive rate (AUC) and recall vs. precision (AUCPR) for all species and across different IUCN threat categories (i.e. data-deficient (DD), least concern (LC), near-threatened (NT), vulnerable (VU), endangered (EN) and critically endangered (CR)). Calculations are based on presence-background data (n = 27,208) and on comparison to expert-based range maps retrieved from IUCN (i.e. reference range, n = 4,257).
| Reference | Red list category | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| DD | LC | NT | VU | EN | CR | Total | |||
| AUC | Presence - background | Mean | 0.939 | 0.937 | 0.95 | 0.96 | 0.971 | 0.957 | 0.945 |
| Median | 0.961 | 0.951 | 0.977 | 0.985 | 0.994 | 0.989 | 0.964 | ||
| Reference range | Mean | 0.817 | 0.89 | 0.927 | 0.931 | 0.929 | 0.915 | 0.902 | |
| Median | 0.852 | 0.925 | 0.972 | 0.974 | 0.98 | 0.987 | 0.943 | ||
| AUCPR | Presence - background | Mean | 0.576 | 0.529 | 0.656 | 0.69 | 0.749 | 0.7 | 0.589 |
| Median | 0.603 | 0.535 | 0.717 | 0.755 | 0.833 | 0.797 | 0.617 | ||
| Reference range | Mean | 0.516 | 0.664 | 0.686 | 0.653 | 0.655 | 0.592 | 0.658 | |
| Median | 0.527 | 0.702 | 0.737 | 0.712 | 0.699 | 0.626 | 0.702 | ||
Fig. 3Performance metrics for the suggested Maxent predictions. (a) Number of reference range maps available used for calculating performance metrics. Average values for species native to the corresponding regions of area under the receiver operating characteristics curve for (b) true vs. false positive rate (AUC) and (c) recall vs. precision (AUCPR). (d) Mean and standard deviation of AUC (blue) and AUCPR (yellow) per rounded log-transformed number of raw occurrence data points (left) and for species in different IUCN red list categories (right), i.e. data-deficient (DD), least concern (LC), near-threatened (NT), vulnerable (VU), endangered (EN) and critically endangered (CR). Significant differences across IUCN categories in d are indicated by different letters in bars for AUC (white text) and AUCPR (black text).
| Measurement(s) | species distributions |
| Technology Type(s) | machine learning |
| Sample Characteristic - Organism | Tracheophyta |
| Sample Characteristic - Environment | terrestrial natural environment |