| Literature DB >> 34047693 |
Alizée Hendrickx1, Cedric Marsboom1, Laura Rinaldi2, Hannah Rose Vineer3, Maria Elena Morgoglione2, Smaragda Sotiraki4, Giuseppe Cringoli2, Edwin Claerebout5, Guy Hendrickx1.
Abstract
Dicrocoelium dendriticum is a trematode that infects ruminant livestock and requires two different intermediate hosts to complete its lifecycle. Modelling the spatial distribution of this parasite can help to improve its management in higher risk regions. The aim of this research was to assess the constraints of using historical data sets when modelling the spatial distribution of helminth parasites in ruminants. A parasitological data set provided by CREMOPAR (Napoli, Italy) and covering most of Italy was used in this paper. A baseline model (Random Forest, VECMAP®) using the entire data set was first used to determine the minimal number of data points needed to build a stable model. Then, annual distribution models were computed and compared with the baseline model. The best prediction rate and statistical output were obtained for 2012 and the worst for 2016, even though the sample size of the former was significantly smaller than the latter. We discuss how this may be explained by the fact that in 2012, the samples were more evenly geographically distributed, whilst in 2016 most of the data were strongly clustered. It is concluded that the spatial distribution of the input data appears to be more important than the actual sample size when computing species distribution models. This is often a major issue when using historical data to develop spatial models. Such data sets often include sampling biases and large geographical gaps. If this bias is not corrected, the spatial distribution model outputs may display the sampling effort rather than the real species distribution. © A. Hendrickx et al., published by EDP Sciences, 2021.Entities:
Keywords: Dicrocoelium dendriticum; Distribution; Italy; Prevalence; Ruminants; Spatial modeling
Mesh:
Year: 2021 PMID: 34047693 PMCID: PMC8162060 DOI: 10.1051/parasite/2021042
Source DB: PubMed Journal: Parasite ISSN: 1252-607X Impact factor: 3.000
Figure 1General approach to designing species distribution maps.
Figure 2Administrative regions of Italy referred to in the paper.
Environmental co-variates (predictor data).
| Abbreviation | Variable |
|---|---|
| NDVI_14A0 | Normalised difference vegetation index transformed Fourier analysis band 14 – A0 – mean |
| NDVI_14A1 | Normalised difference vegetation index transformed Fourier analysis band 14 – A1 – amplitude of annual cycle |
| NDVI_14A2 | Normalised difference vegetation index transformed Fourier analysis band 14 – A2 – amplitude of bi-annual cycle |
| NDVI_14A3 | Normalised difference vegetation index transformed Fourier analysis band 14 – A3 – amplitude of tri-annual cycle |
| NDVI_14D1 | Normalised difference vegetation index transformed Fourier analysis band 14 – D1 – variance in annual cycle |
| NDVI_14D2 | Normalised difference vegetation index transformed Fourier analysis band 14 – D2 – variance in bi-annual cycle |
| NDVI_14D3 | Normalised difference vegetation index transformed Fourier analysis band 14 – D3 – variance in tri-annual cycle |
| NDVI_14DA | Normalised difference vegetation index transformed Fourier analysis band 14 – DA – combined variance in annual, bi-annual, and tri-annual cycles |
| NDVI_14MN | Normalised difference vegetation index transformed Fourier analysis band 14 – MN – minimum |
| NDVI_14MX | Normalised difference vegetation index transformed Fourier analysis band 14 – MX – maximum |
| NDVI_14P1 | Normalised difference vegetation index transformed Fourier analysis band 14 – P1 – phase of annual cycle |
| NDVI_14P2 | Normalised difference vegetation index transformed Fourier analysis band 14 – P2 – phase of bi-annual cycle |
| NDVI_14P3 | Normalised difference vegetation index transformed Fourier analysis band 14 – P3 – phase of tri-annual cycle |
| NDVI_14VR | Normalised difference vegetation index transformed Fourier analysis band 14 – VR – variance in raw data parameter Fourier variable image values |
| BIO 1 | Annual mean temperature (°C) |
| BIO 2 | Annual mean diurnal range (°C) |
| BIO 3 | Isothermality (°C) |
| BIO 4 | Temperature seasonality (standard deviation) (°C) |
| BIO 5 | Tmax of warmest month (°C) |
| BIO 6 | Tmin of coldest month (°C) |
| BIO 7 | Annual temperature range (°C) |
| BIO 8 | Mean temperature of wettest quarter (°C) |
| BIO 9 | Mean temperature of driest quarter (°C) |
| BIO 10 | Mean temperature of warmest quarter (°C) |
| BIO 11 | Mean temperature of coldest quarter (°C) |
| BIO 12 | Annual precipitation (mm) |
| BIO 13 | Precipitation of wettest month (mm) |
| BIO 14 | Precipitation of driest month (mm) |
| BIO 15 | Precipitation seasonality (coefficient of variation) (%) |
| BIO 16 | Precipitation of wettest quarter (mm) |
| BIO 17 | Precipitation of driest quarter (mm) |
| BIO 18 | Precipitation of warmest quarter (mm) |
| BIO 19 | Precipitation of coldest quarter (mm) |
| TempXX_A0 | Temperature of XX (depending on the year that is modelled 09, 12, 13, 14, 15, 16) – amplitude of annual cycle |
| TempXX_A1 | Temperature of XX (depending on the year that is modelled 09, 12, 13, 14, 15, 16) – amplitude of bi-annual cycle |
| TempXX_A2 | Temperature of XX (depending on the year that is modelled 09, 12, 13, 14, 15, 16) – amplitude of tri-annual cycle |
| TempXX_P0 | Temperature of XX (depending on the year that is modelled 09, 12, 13, 14, 15, 16) – phase of annual cycle |
| TempXX_P1 | Temperature of XX (depending on the year that is modelled 09, 12, 13, 14, 15, 16) – phase of bi-annual cycle |
| TempXX_P2 | Temperature of XX (depending on the year that is modelled 09, 12, 13, 14, 15, 16) – phase of tri-annual cycle |
Statistical model output baseline model using RF of Dicrocoelium dendriticum.
| BCS | −10% | −20% | −30% | −40% | −50% | −60% | −70% | −80% | −90% | |
|---|---|---|---|---|---|---|---|---|---|---|
| Presence points | 2508 | 2258 | 2008 | 1758 | 1508 | 1258 | 1008 | 758 | 508 | 258 |
| Kappa | 0.61 | 0.56 | 0.54 | 0.60 | 0.53 | 0.56 | 0.54 | 0.48 | 0.53 | 0.54 |
| AUC | 0.72 | 0.71 | 0.71 | 0.71 | 0.72 | 0.70 | 0.70 | 0.67 | 0.71 | 0.71 |
| Sensitivity | 0.68 | 0.68 | 0.66 | 0.69 | 0.67 | 0.64 | 0.64 | 0.62 | 0.66 | 0.70 |
| Specificity | 0.64 | 0.64 | 0.63 | 0.63 | 0.65 | 0.63 | 0.65 | 0.60 | 0.65 | 0.61 |
| Predictor importance | Bio11 | Bio11 | Bio11 | Bio05 | Bio01 | Bio09 | Bio01 | Bio11 | Bio11 | Bio01 |
| Bio05 | Bio08 | Bio01 | Bio11 | Bio09 | Bio11 | Bio09 | Bio05 | Bio01 | Bio10 | |
| Bio07 | Bio06 | Bio12 | Bio06 | Bio11 | Bio16 | Bio11 | Bio01 | Bio06 | Bio05 |
Statistical model output using RF of Dicrocoelium dendriticum: BCS and −70% of baseline model compared to statistical output of 2009, 2012, 2013, 2015 and 2016.
| BCS | −70% | 2009 | 2012 | 2013 | 2015 | 2016 | |
|---|---|---|---|---|---|---|---|
| Presence points | 2508 | 758 | 175 | 163 | 134 | 120 | 415 |
| Kappa | 0.61 | 0.48 | 0.47 | 0.59 | 0.48 | 0.56 | 0.47 |
| AUC | 0.72 | 0.67 | 0.65 | 0.74 | 0.67 | 0.73 | 0.62 |
| Sensitivity | 0.68 | 0.61 | 0.60 | 0.67 | 0.57 | 0.70 | 0.61 |
| Specificity | 0.64 | 0.60 | 0.62 | 0.69 | 0.62 | 0.64 | 0.60 |
| Predictor importance | Bio11 | Bio11 | Bio16 | Bio12 | Bio12 | Bio12 | Bio13 |
| Bio05 | Bio05 | Bio13 | Bio16 | Bio18 | Bio16 | Bio19 | |
| Bio07 | Bio01 | Bio15 | Bio19 | Bio16 | Bio14 | Bio12 |
Figure 3Data distribution for Dicrocoelium dendriticum from 1999–2018.
Figure 4Baseline model, BCS.
Figure 5Baseline model, −70%.
Figure 6Data distribution for Dicrocoelium dendriticum 2012.
Figure 7Annual distribution model 2012.
Figure 8Data distribution for Dicrocoelium dendriticum 2016.
Figure 9Annual distribution model 2016.
Hendrickx et al., 2020: Constraints of using historical data for modelling the spatial distribution of helminth parasites in ruminants.
| ODMAP element | Contents |
|---|---|
| Overview | |
| Authorship |
|
|
| |
|
| |
|
| |
| Model objective |
|
|
| |
| Taxon | Parasitic helminth, |
| Location | Italy. |
| Scale of analysis |
|
|
| |
|
| |
|
| |
| Biodiversity data overview |
|
|
| |
| Type of predictors | Vegetation, bioclimatic, livestock density. |
| Conceptual model/hypothesis |
|
| Assumptions |
Diagnostic data were representative of presence or absence of infection in the host. Sensitivity of diagnostic data does not change in space or time. The chosen environmental covariates represent all relevant environmental drivers of distribution. The data encompass the species’ realised niche in the area modelled (after bias-correction – see below). Sample selection bias is adequately corrected (see below). |
| SDM algorithms |
|
|
| |
|
| |
| Model workflow | After preparation of environmental covariates, removal of errors and bias-correction (see below), Random Forest models were fitted to the full dataset, and to a reduced set of covariates identified as important in the full model. This process was repeated for incrementally increasing sample sizes (see portioning information below) to identify the minimal sample size, below which statistical performance deteriorates. Models were also fitted using the same process to annual occurrence data. |
| Software |
|
| Data | |
| Biodiversity data |
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
| Data partitioning | A model was developed using the full datasets to demonstrate the “best-case scenario” (BCS), before reducing the size of occurrence dataset in 10% increments at random, to evaluate the impact of sample size on model performance. |
| Models were also fitted to data for the 5 years between 1999 and 2018 with the highest occurrence data sample size to evaluate the impact of dataset on model performance. | |
| Predictor variables |
NDVI data from MODIS ( Bioclimatic variables [ Gridded Livestock of the World livestock density data [ |
|
| |
|
| |
|
| |
|
| |
| Model | |
| Variable pre-selection | The choice of initial covariates was made as a compromise between availability and ecological/biological relevance to the study species. Only weakly correlated covariates were included in the models. |
| Multicollinearity | Multicollinearity between the covariates was investigated using the variance inflation factor and Spearman rank correlations. Covariates with VIF > 10 were discarded. Only one variable from pairs with correlations >0.7 was retained to avoid model overfitting. |
| Model settings | Default settings were used throughout, except for the number of replicates and the number of variables to evaluate at each node. For initial models using all variables (variable selection step), 500 replicates and 8 variables were specified. For models using the reduced set of variables selected for their importance, 100 replicates and 6 nodes were specified. |
| Model estimates | Covariate importance was estimated with mean decrease accuracy and mean decrease Gini. |
| Model averaging/ensembles | Not applicable. |
| Non-independence | Not done, see discussion. |
| Assessment | |
| Performance statistics | Model evaluation is based on standard model statistics. These include Sensitivity, Specificity, Cohen’s Kappa, and Area Under Curve (AUC). |
| Plausibility checks | Expert analysis is used to evaluate the plausibility of the mapped model outputs. |
| Prediction | |
| Prediction output | Predictions of relative probability of presence of |
| Uncertainty quantification | Not applicable – ensembles not performed. |