| Literature DB >> 33841764 |
Florence Matutini1, Jacques Baudry2, Guillaume Pain1, Morgane Sineau3, Joséphine Pithon1.
Abstract
Species distribution models (SDM) have been increasingly developed in recent years, but their validity is questioned. Their assessment can be improved by the use of independent data, but this can be difficult to obtain and prohibitive to collect. Standardized data from citizen science may be used to establish external evaluation datasets and to improve SDM validation and applicability.We used opportunistic presence-only data along with presence-absence data from a standardized citizen science program to establish and assess habitat suitability maps for 9 species of amphibian in western France. We assessed Generalized Additive and Random Forest Models' performance by (1) cross-validation using 30% of the opportunistic dataset used to calibrate the model or (2) external validation using different independent datasets derived from citizen science monitoring. We tested the effects of applying different combinations of filters to the citizen data and of complementing it with additional standardized fieldwork.Cross-validation with an internal evaluation dataset resulted in higher AUC (Area Under the receiver operating Curve) than external evaluation causing overestimation of model accuracy and did not select the same models; models integrating sampling effort performed better with external validation. AUC, specificity, and sensitivity of models calculated with different filtered external datasets differed for some species. However, for most species, complementary fieldwork was not necessary to obtain coherent results, as long as the citizen science data were strongly filtered.Since external validation methods using independent data are considered more robust, filtering data from citizen sciences may make a valuable contribution to the assessment of SDM. Limited complementary fieldwork with volunteer's participation to complete ecological gradients may also possibly enhance citizen involvement and lead to better use of SDM in decision processes for nature conservation.Entities:
Keywords: amphibians; biodiversity conservation; data culling; data filtering; external evaluation; habitat suitability modeling; sampling effort
Year: 2021 PMID: 33841764 PMCID: PMC8019030 DOI: 10.1002/ece3.7210
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
FIGURE 1Description of datasets, filters and complementation used for external evaluation derived. Number of data (n grid cells) given for T. marmoratus, for 1 iteration only and s3. CS: data derived from the citizen science data set; SUP: additional data collected by volunteers (VOL) and professionals (PRO)
Environmental variables used for species distribution modeling of each amphibian species in Pays de la Loire region. Associated references are available in the Appendix S2: Table S1
| Variable category | Code | Variable description | Original resolution |
|---|---|---|---|
| Climatic | CLIM_1 | First axis from a PCA on 12 worldclim variables and altitude | 2.5 arc‐min/5 km |
| CLIM_2 | Second axis from a PCA on 12 worldclim variables and altitude | 2.5 arc‐min/5 km | |
| Land cover | %WOOD_DM | Proportion of deciduous and mixed forest | 5 m |
| %WOOD_C | Proportion of coniferous forest | 5 m | |
| %CROP | Proportion of crop | 20 m | |
| %PASTURE | Proportion of permanent pasture | 20 m | |
| NB_PONDS | Pond density (or water point density) | 5 m | |
| L_HEDGE | Hedgerow density | 5 m | |
| L_ROAD_1ST | Primary road density outside urban areas | 5 m | |
| L_ROAD_2ND | Secondary road density outside urban areas | 5 m | |
| L_RIVER | Canal and river density | 5 m | |
| %URBAN | Proportion of urban area | 20 m |
FIGURE 2Model performance for the 9 studied species assessed by external or internal data using different pseudo‐absence selection strategies. Assessment by AUC under the ROC for GAM only are shown (see Appendix S1–S4). Artificial absence sampling strategies shown are s2 (random pseudo‐absence selection excluding known presence points) and s3 (random pseudo‐absence selection excluding known presence points and adjusted to consider site accessibility and sampling effort). Per strategy, 10 replicates of the artificial absence points generation processes with 50 bootstraps for the random selection of the straining set (70%) and the internal testing set (30%). Black dotted line indicates the 0.70 threshold above which models have an acceptable level of accuracy (Swets 1988)
FIGURE 3Habitat suitability maps for six studied species produced using two form of pseudo‐absence selection: s2 (random pseudo‐absence selection excluding known presence points) and s3 (random pseudo‐absence selection excluding known presence points and constrained to account sampling effort). The black and white map under each pair shows net difference between s2 and s3. Map resolution is 500 m
Model performance according to different filters and complementary fieldwork applied to the external evaluation dataset
|
|
|
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SEN | SPE | AUC | SEN | SPE | AUC | SEN | SPE | AUC | SEN | SPE | AUC | ||
|
| s2 | 0.63 | 0.41 | 0.58 |
| 0.29 | 0.53 |
| 0.43 | 0.68 | 0.71 |
| 0.87 |
| s3 |
|
|
| 0.64 |
| 0.54 | 0.78 |
| 0.68 |
| 0.84 | 0.88 | |
|
| s2 | 0.59 | 0.78 | 0.81 |
| 0.76 | 0.80 |
| 0.61 | 0.73 | 0.64 |
| 0.86 |
| s3 | 0.58 |
| 0.81 | 0.62 |
|
| 0.68 |
| 0.74 |
| 0.84 | 0.86 | |
|
| s2 | 0.58 |
| 0.82 |
| 0.77 | 0.68 |
|
| 0.63 | 0.59 |
| 0.85 |
| s3 |
| 0.78 | 0.81 | 0.49 | 0.78 |
| 0.57 | 0.54 | 0.63 |
| 0.86 | 0.85 | |
|
| s2 | 0.58 |
|
|
| 0.76 | 0.61 |
|
| 0.51 |
|
|
|
| s3 |
| 0.77 | 0.83 | 0.37 |
|
| 0.51 | 0.33 |
|
|
|
| |
| STRAT_CS | s2 | 0.71 |
|
| 0.67 | 0.81 | 0.82 | 0.72 | 0.64 | 0.73 | 0.79 | 0.57 | 0.69 |
| s3 | 0.70 | 0.69 | 0.78 | 0.68 | 0.82 |
| 0.71 | 0.63 |
|
|
|
| |
| STRAT_ALL | s2 |
|
|
| 0.66 | 0.76 | 0.78 |
|
| 0.70 | 0.84 | 0.57 | 0.73 |
| s3 | 0.78 | 0.70 | 0.83 |
|
|
| 0.66 | 0.60 |
|
| 0.58 |
| |
External datasets used were (see Figure 1): CS.0 (all data from the standardized citizen science dataset); CS.2 + ABS (CS.2 with 10% supplementary absence cells in very unfavorable habitats); PRO (data collected by professionals only in 2018–2019); CS.2 + ABS + SUP (citizen science data cited before adding all complementary fieldwork by professionals and volunteers); STRAT_CS (stratified data selection from CS.2 + ABS with complementary fieldwork by volunteers); STRAT_ALL (stratified data selection from CS.2 + ABS + SUP). Models assessed: s2 (random pseudo‐absence selection excluding known presence points) and s3 (random pseudo‐absence selection constrained to account sampling effort and correct sampling bias). SEN, sensitivity; SPE, specificity. Bold values show best values between s2 et s3 with delta >0.02 and italic values show species with less than 2 presence data. All analyses with a random sampling in presence selection with a distance condition or a stratified random selection were performed using 100 bootstraps (mean calculation).