| Literature DB >> 29545613 |
Simone Franceschini1, Emanuele Gandola2,3, Marco Martinoli2, Lorenzo Tancioni2, Michele Scardi2.
Abstract
Species distribution is the result of complex interactions that involve environmental parameters as well as biotic factors. However, methodological approaches that consider the use of biotic variables during the prediction process are still largely lacking. Here, a cascaded Artificial Neural Networks (ANN) approach is proposed in order to increase the accuracy of fish species occurrence estimates and a case study for Leucos aula in NE Italy is presented as a demonstration case. Potentially useful biotic information (i.e. occurrence of other species) was selected by means of tetrachoric correlation analysis and on the basis of the improvements it allowed to obtain relative to models based on environmental variables only. The prediction accuracy of the L. aula model based on environmental variables only was improved by the addition of occurrence data for A. arborella and S. erythrophthalmus. While biotic information was needed to train the ANNs, the final cascaded ANN model was able to predict L. aula better than a conventional ANN using environmental variables only as inputs. Results highlighted that biotic information provided by occurrence estimates for non-target species whose distribution can be more easily and accurately modeled may play a very useful role, providing additional predictive variables to target species distribution models.Entities:
Mesh:
Year: 2018 PMID: 29545613 PMCID: PMC5854617 DOI: 10.1038/s41598-018-22761-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Sampling sites. Veneto river basins, NE of Italy. (a) Elevation map of the river basins. BLACK dots mark the position of the sample sites. (b) L. aula occurrence in the river basins. GREEN dots mark presence, RED both presence and absence (same site, different times), BLACK absence. Images were obtained by using QGIS software[51] (http://grass.osgeo.org). Original image was generated by Michele Scardi and then processed by Emanuele Gandola using Adobe Photoshop cs6 (Version 13.0).
List of the fish species in the Veneto data set.
| N | Scientific name | English name |
|---|---|---|
| 1 |
| (Triotto) |
| 2 |
| Padanian Goby |
| 3 |
| Rudd |
| 4 |
| European Pike |
| 5 |
| Chub |
| 6 |
| Bleak |
| 7 |
| Bullhead |
| 8 |
| Tench |
| 9 |
| Spined loach |
| 10 |
| Minnow |
| 11 |
| European Eel |
| 12 |
| Italian Spring Goby |
| 13 |
| Marble Trout |
| 14 |
| Italian Loach |
| 15 |
| Black Bullhead |
| 16 |
| Pumpkinseed |
| 17 |
| Italian Barbel |
| 18 |
| South Europe Nase |
| 19 |
| Three-spined Stickleback |
| 20 |
| Crucian Carp |
| 21 |
| Gudgeon |
| 22 |
| Blageon |
| 23 |
| Grayling |
| 24 |
| Po Brook Lamprey |
| 25 |
| Eastern mosquitofish |
| 26 |
| Mediterreanean Barbel |
| 27 |
| Large-Mouthed Bass |
| 28 |
| Perch |
| 29 |
| Common Bream |
| 30 |
| Common Carp |
| 31 |
| Brook Char |
| 32 |
| Sea Trout |
| 33 |
| Rainbow Trout |
| 34 | Sea Trout-Marble Trout hybrid |
Taxa on white background were used in the models while grey background highlights the excluded species. Scientific names were revised according to the current classification. The Italian name is shown in brackets for the only species with no English name. *Taxa excluded since their presence records were <10. **Taxa excluded regardless of their rarity because their occurrence depends on stocking programmes.
Environmental descriptors used as input (i.e. predictive) variables.
| Variable | Min | Max | Mean | Median |
|---|---|---|---|---|
| Elevation (m) | 13.00 | 1785.00 | 400.92 | 260.00 |
| Mean depth (m) | 0.01 | 1.46 | 0.45 | 0.40 |
| Runs (area, %) | 0.00 | 100.00 | 55.14 | 55.00 |
| Pools (area, %) | 0.00 | 90.00 | 14.79 | 5.56 |
| Riffles (area, %) | 0.00 | 100.00 | 30.00 | 22.03 |
| Mean width (m) | 1.00 | 80.00 | 9.32 | 6.00 |
| Boulders (area, %) | 0.00 | 100.00 | 17.01 | 10.00 |
| Rocks and pebbles (area, %) | 0.00 | 100.00 | 29.97 | 30.00 |
| Gravel (area, %) | 0.00 | 96.00 | 21.48 | 15.00 |
| Sand (area, %) | 0.00 | 80.00 | 7.99 | 4.50 |
| Silt and clay (area, %) | 0.00 | 100.00 | 23.44 | 0.00 |
| Stream velocity (score, 0–5) | 0.00 | 5.00 | 0.00 | 0.00 |
| Vegetation cover (area, %) | 0.00 | 100.00 | 10.85 | 0.00 |
| Shade (%) | 0.00 | 100.00 | 37.86 | 40.00 |
| Anthropic disturbance (score, 0–4) | 0.00 | 4.00 | 1.45 | 1.60 |
| pH | 5.63 | 9.33 | 7.75 | 7.76 |
| Conductivity (µS cm−1) | 11.00 | 1851.00 | 406.63 | 390.00 |
| Gradient (%) | 0.02 | 41.60 | 4.38 | 1.38 |
| Catchment area (km2) | 0.34 | 3274.01 | 169.82 | 19.71 |
| Distance from source (km) | 0.33 | 119.27 | 16.79 | 7.14 |
Figure 2Model development. The general procedure for training a cascaded ANN model involves four steps: 1) an ANN aimed at predicting the target species (y) is trained with n environmental variables (x) only as inputs and its output is analyzed to establish the baseline performance level; 2) p ANNs are trained to predict the same target species, using the same n environmental variables and an additional input based on the occurrence records for each one of the p remaining species, one at the time, thus identifying the species whose addition as co-predictor provides the largest performance improvement in relative to step 1; 3) an ANN aimed at assessing the expected occurrence of the most effective co-predictor species, according to step 2, is trained using as inputs the n environmental variables only; 4) a cascaded ANN model aimed at predicting the target species is obtained by combining the best ANN from step 2 and the one from step 3. The cascaded ANN model needs observed data for the environmental variables only, while biotic information is provided through sub-model predictions and therefore is not needed to run the model. Green ANN input nodes require field data, while pink ANN nodes provide or require predicted values. Only a single co-predictor species is shown, but a very similar procedure can be applied if more co-predictor species are used.
Confusion matrix obtained by L. aula prediction on testing set.
| Observed | |||
|---|---|---|---|
| Absence (0) | Presence (1) | ||
| Predicted | Absence (0) | 35 | 3 |
| Presence (1) | 5 | 10 | |
Figure 3Results obtained by addition of correlated species. K values of models obtained by the addition of an additional co-predictor species relative to their tetrachoric correlation coefficient with L. aula. Species whose addition significantly increased the K value, i.e. above the upper limit of the confidence interval of the model based on environmental variables only, i.e. [0.410,0.805], are marked in bold. Grey dots represent results from 5each fold in the 5-fold cross-validation. Image was obtained by using R software[31].
Confusion matrix obtained by the addition of A. arborella observed occurrence as input variable.
| Observed | |||
|---|---|---|---|
| Absence (0) | Presence (1) | ||
| Predicted | Absence (0) | 36 | 0 |
| Presence (1) | 4 | 13 | |
Confusion matrix obtained by the addition of S. erythrophthalmus observed occurrence as input variable.
| Observed | |||
|---|---|---|---|
| Absence (0) | Presence (1) | ||
| Predicted | Absence (0) | 37 | 1 |
| Presence (1) | 3 | 12 | |
Confusion matrix obtained by the addition of A. arborella predicted occurrence as input variable.
| Observed | |||
|---|---|---|---|
| Absence (0) | Presence (1) | ||
| Predicted | Absence (0) | 35 | 1 |
| Presence (1) | 5 | 12 | |
Confusion matrix obtained by the addition of S. erythrophthalmus predicted occurrence as input variable.
| Observed | |||
|---|---|---|---|
| Absence (0) | Presence (1) | ||
| Predicted | Absence (0) | 36 | 2 |
| Presence (1) | 4 | 11 | |
Finally, another model was trained using both species presence probabilities as co-predictors, thus obtaining a K value of 0.765 (Table 8). The “profile”, “perturbation” and “weights” sensitivity analyses were performed on this model.
Confusion matrix obtained by the addition of both species predicted occurrences as input variables.
| Observed | |||
|---|---|---|---|
| Absence (0) | Presence (1) | ||
| Predicted | Absence (0) | 36 | 1 |
| Presence (1) | 4 | 12 | |
Figure 4Lek’s “profile” method for sensitivity analysis. The occurrence expected probability of L. aula (“Response”) at increasing values of each input variable, keeping all the others normalized inputs at five fixed levels ranging from 0 to 1 with a 0.25 step, thus generating five response curves. Images were obtained by using R software.
Figure 5“Perturbation” method for sensitivity analysis. Percent increase in mean square error of the ANN output obtained by perturbation of the test set data patterns. White noise in the [−0.3, 0.3] range was added to each value of each input variable, while keeping all the other inputs at their original values. Image was obtained by using R software.
Figure 6“Weights” method for variable imprtance. Relative importance of input variables is assessed on the basis of ANN weights. Negative contributions of input variables imply negative correlation between predictive variables and L. aula occurrence (e.g. probability of presence is expected to be low at high elevation sites). Image was obtained by using R software.