| Literature DB >> 28303176 |
Louise Hill1, Andy Hector1, Gabriel Hemery2, Simon Smart3, Matteo Tanadini1, Nick Brown1.
Abstract
High-quality abundance data are expensive and time-consuming to collect and often highly limited in availability. Nonetheless, accurate, high-resolution abundance distributions are essential for many ecological applications ranging from species conservation to epidemiology. Producing models that can predict abundance well, with good resolution over large areas, has therefore been an important aim in ecology, but poses considerable challenges. We present a two-stage approach to modeling abundance, combining two established techniques. First, we produce ensemble species distribution models (SDMs) of trees in Great Britain at a fine resolution, using much more common presence-absence data and key environmental variables. We then use random forest regression to predict abundance by linking the results of the SDMs to a much smaller amount of abundance data. We show that this method performs well in predicting the abundance of 20 of 25 tested British tree species, a group that is generally considered challenging for modeling distributions due to the strong influence of human activities. Maps of predicted tree abundance for the whole of Great Britain are provided at 1 km2 resolution. Abundance maps have a far wider variety of applications than presence-only maps, and these maps should allow improvements to aspects of woodland management and conservation including analysis of habitats and ecosystem functioning, epidemiology, and disease management, providing a useful contribution to the protection of British trees. We also provide complete R scripts to facilitate application of the approach to other scenarios.Entities:
Keywords: abundance distributions; abundance–occupancy relationships; biotic effects; mapping
Year: 2017 PMID: 28303176 PMCID: PMC5306018 DOI: 10.1002/ece3.2661
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Ecological variables downloaded and produced for species distribution modeling. Details of data sources can be found in Data Accessibility
| Variable | Description | Unit | Source |
|---|---|---|---|
| bio1 | Annual mean temperature | °C × 10 | Worldclim |
| bio2 | Mean diurnal temperature range: mean of monthly (max temp − min temp) | °C × 10 | Worldclim |
| bio3 | Isothermality (bio2/bio7 × 100) | °C × 10 | Worldclim |
| bio4 | Temperature seasonality: standard deviation × 100 | °C × 1000 | Worldclim |
| bio5 | Max temperature of warmest month | °C × 10 | Worldclim |
| bio6 | Min temperature of warmest month | °C × 10 | Worldclim |
| bio7 | Temperature annual range | °C × 10 | Worldclim |
| bio8 | Mean temperature of wettest quarter | °C × 10 | Worldclim |
| bio9 | Mean temperature of driest quarter | °C × 10 | Worldclim |
| bio10 | Mean temperature of warmest quarter | °C × 10 | Worldclim |
| bio11 | Mean temperature of coldest quarter | °C × 10 | Worldclim |
| bio12 | Annual precipitation | mm | Worldclim |
| bio13 | Precipitation of wettest month | mm | Worldclim |
| bio14 | Precipitation of Driest Month | mm | Worldclim |
| bio15 | Precipitation seasonality: coefficient of variation | cm | Worldclim |
| bio16 | Precipitation of wettest quarter | mm | Worldclim |
| bio17 | Precipitation of driest quarter | mm | Worldclim |
| bio18 | Precipitation of warmest quarter | mm | Worldclim |
| bio19 | Precipitation of coldest quarter | mm | Worldclim |
| altitude | Altitude | m × 10 | Worldclim |
| slope | Slope | % | Derived from Altitude using ArcGIS (Slope) |
| aspect | Aspect | Degrees | Derived from Altitude using ArcGIS (Slope) |
| directradiat | Direct radiation: incoming direct solar radiation | Watt hr m−2 | Derived from Altitude using ArcGIS (Solar Radiation Analysis) |
| directdurat | Direct duration: duration of direct solar radiation | Hours | Derived from Altitude using ArcGIS (Solar Radiation Analysis) |
| diffuseradiat | Diffuse radiation: incoming scattered solar radiation | Watt hr m−2 | Derived from Altitude using ArcGIS (Solar Radiation Analysis) |
| nfi | National Forest Inventory Great Britain 2014, forested areas | Nominal | Forestry Commission |
| soil | Soil type | Nominal | European Soil Database |
| soiltext | Dominant soil surface textural class | Nominal | European Soil Database |
| octop | Topsoil organic carbon content | Nominal | European Soil Database |
| awctop | Topsoil available water capacity | Nominal | European Soil Database |
| mintop | Topsoil minerology | Nominal | European Soil Database |
| ancient_es | Ancient woodlands in England, Scotland and Wales | Nominal | Natural England, Forestry Commission Scotland and National Resources Wales |
| land cover 07 | UK Land cover map 2007 (1 km2) | Nominal | Countryside Survey/CEH |
The number, type, and prediction accuracy of the individual models used to build ensemble distribution models for each tree species. Algorithms included were GAM (generalized additive model), GBM (generalized boosted regression), GLM (General Linear Model), RF (Random Forest), and MaxEnt (Maximum Entropy)
| Species | Number of models used to build ensemble | Algorithms included | Mean ROC score | Mean TSS score |
|---|---|---|---|---|
|
| 20 | GAM, RF, GBM | 0.92 | 0.71 |
|
| 20 | GLM, GAM, RF, GBM | 0.76 | 0.44 |
|
| 20 | GAM, RF, GBM | 0.85 | 0.55 |
|
| 15 | RF | 0.80 | 0.46 |
|
| 15 | RF | 0.79 | 0.46 |
|
| 15 | RF | 0.80 | 0.46 |
|
| 20 | RF, GBM, MaxEnt | 0.78 | 0.40 |
|
| 15 | RF | 0.81 | 0.47 |
|
| 16 | RF, GBM | 0.86 | 0.46 |
|
| 20 | GLM, GBM, RF, GBM | 0.96 | 0.82 |
|
| 20 | GAM, RF, GBM | 0.81 | 0.48 |
|
| 20 | GLM, GAM, RF, GBM | 0.92 | 0.83 |
|
| 17 | RF, GBM | 0.71 | 0.31 |
|
| 11 | RF | 0.75 | 0.36 |
|
| 20 | RF, GBM | 0.80 | 0.48 |
|
| 19 | GAM, RF, GBM, MaxEnt | 0.76 | 0.39 |
|
| 15 | RF | 0.82 | 0.49 |
|
| 16 | RF, GBM | 0.90 | 0.64 |
|
| 16 | RF, GBM | 0.79 | 0.42 |
|
| 16 | RF, GBM | 0.78 | 0.42 |
|
| 20 | RF, GBM | 0.84 | 0.53 |
|
| 20 | GAM, RF, GBM | 0.80 | 0.44 |
|
| 15 | RF, GBM | 0.76 | 0.36 |
|
| 15 | RF | 0.79 | 0.43 |
|
| 15 | RF | 0.89 | 0.61 |
Figure 1Schematic showing the outline of the two‐stage method for predicting abundance distributions. The first stage uses SDMs to produce maps of predicted probability of occupancy, while the second stage takes these maps as inputs and uses Random Forest regression to produce maps of predicted abundance. Distribution data inputs are shown in square boxes and model covariates in round boxes, and model outputs are shaded in solid gray and modeling processes in hashed gray
Figure 2Observed abundance against abundance predicted by Random Forest regression, as used to assess model performance, shown for four tree species. The line on each graph is the 1:1 line showing perfect model fit
Root‐mean‐square error (RMSE) and mean absolute error (MAE) scores for the Random Forest regression model for each species. The number of nonzero data points available for each species is also shown
| Species | RMSE | MAE | Number of nonzero data points |
|---|---|---|---|
|
| 1.44 | 0.35 | 315 |
|
| 1.27 | 0.19 | 42 |
|
| 4.01 | 1.40 | 634 |
|
| 2.40 | 0.66 | 195 |
|
| 6.88 | 2.29 | 802 |
|
| 4.09 | 1.09 | 127 |
|
| 3.79 | 1.05 | 320 |
|
| 9.56 | 3.58 | 501 |
|
| 4.47 | 1.47 | 935 |
|
| 1.10 | 0.23 | 339 |
|
| 8.45 | 2.91 | 918 |
|
| 4.95 | 1.88 | 1629 |
|
|
|
| 16 |
|
| 1.98 | 0.56 | 401 |
|
|
|
| 9 |
|
| 7.66 | 1.96 | 193 |
|
| 5.99 | 1.84 | 209 |
|
| 6.50 | 2.54 | 1867 |
|
| 1.38 | 0.28 | 74 |
|
| 0.16 | 0.03 | 55 |
|
|
|
| 3 |
|
| 2.21 | 0.49 | 86 |
|
| 1.04 | 0.14 | 56 |
|
|
|
| 22 |
|
|
|
| 27 |
Figure 3Maps of predicted abundance for four species, in hectares per km2, or percent cover. Note the scale varies between species. Maps for all other successfully modeled species are available to download from Sylva Foundation website and Oxford University Research Archive (see Data Accessibility)
Figure 4(a) Presence records of Acer campestre, downloaded from the BSBI database (some of the best available distribution data at the country‐wide level). The data are presence only on 2 × 2 km (tetrad) scale. Note that where presence is not recorded, it is impossible to say whether the species is truly absent. Compare with our modeled abundance distribution (b) showing modeled hectares covered by A. campestre per square kilometer for every 1 km square
| Altitude | Aspect | Slope | Direct incoming solar radiation | Mean diurnal temperature range | Temperature seasonality | Annual precipitation | Topsoil available water capacity | Topsoil minerology | Topsoil organic carbon content | Topsoil texture class | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Altitude | 1 | 0.015396 | 0.518652 | 0.591347 | −0.20693 | −0.03162 | 0.621356 | 0.231888 | 0.000805 | 0.303198 | 0.428471 |
| Aspect | 0.015396 | 1 | −0.02016 | 0.062394 | −0.05345 | −0.0269 | 0.062306 | 0.014767 | 0.019412 | 0.035133 | 0.041259 |
| Slope | 0.518652 | −0.02016 | 1 | 0.224245 | −0.08178 | 0.00709 | 0.331202 | 0.104236 | −0.01246 | 0.154548 | 0.180757 |
| Direct incoming solar radiation | 0.591347 | 0.062394 | 0.224245 | 1 | −0.30348 | −0.17634 | 0.627365 | 0.225541 | 0.055276 | 0.231702 | 0.407135 |
| Mean diurnal temperature range | −0.20693 | −0.05345 | −0.08178 | −0.30348 | 1 | 0.620974 | −0.57396 | −0.0862 | 0.068989 | −0.26488 | −0.27498 |
| Temperature seasonality | −0.03162 | −0.0269 | 0.00709 | −0.17634 | 0.620974 | 1 | −0.37328 | −0.13755 | 0.054918 | −0.21352 | −0.27218 |
| Annual precipitation | 0.621356 | 0.062306 | 0.331202 | 0.627365 | −0.57396 | −0.37328 | 1 | 0.27932 | 0.039224 | 0.36968 | 0.506227 |
| Topsoil available water capacity | 0.231888 | 0.014767 | 0.104236 | 0.225541 | −0.0862 | −0.13755 | 0.27932 | 1 | 0.152653 | 0.305173 | 0.590749 |
| Topsoil minerology | 0.000805 | 0.019412 | −0.01246 | 0.055276 | 0.068989 | 0.054918 | 0.039224 | 0.152653 | 1 | 0.142262 | 0.298536 |
| Topsoil organic carbon content | 0.303198 | 0.035133 | 0.154548 | 0.231702 | −0.26488 | −0.21352 | 0.36968 | 0.305173 | 0.142262 | 1 | 0.386809 |
| Topsoil texture class | 0.428471 | 0.041259 | 0.180757 | 0.407135 | −0.27498 | −0.27218 | 0.506227 | 0.590749 | 0.298536 | 0.386809 | 1 |
| Species | Most important variable in abundance model |
|---|---|
|
| Cover of trees in NFI |
|
| Probability of occupancy of |
|
| Cover of trees in NFI |
|
| Probability of occupancy of |
|
| Cover of all trees |
|
| Cover of trees outside of NFI |
|
| Cover of all trees |
|
| Cover of trees in NFI |
|
| Cover of trees in NFI |
|
| Probability of occupancy of |
|
| Cover of all trees |
|
| Cover of trees in NFI |
|
| Probability of occupancy of |
|
| Cover of trees in NFI |
|
| Cover of trees outside of NFI |
|
| Cover of trees in NFI |
|
| Probability of occupancy of |
|
| Probability of occupancy of |
|
| Cover of trees in NFI |
|
| Probability of occupancy of |
| Species |
| RMSE | MAE | Number of data points per species | Number of nonzero data points |
|---|---|---|---|---|---|
|
| .523 | 1.44 | 0.35 | 679 | 315 |
|
| .207 | 1.27 | 0.19 | 444 | 42 |
|
| .426 | 4.01 | 1.40 | 906 | 634 |
|
| .271 | 2.40 | 0.66 | 484 | 195 |
|
| .450 | 6.88 | 2.29 | 1,261 | 802 |
|
| .596 | 4.09 | 1.09 | 501 | 127 |
|
| .690 | 3.79 | 1.05 | 755 | 320 |
|
| .764 | 9.56 | 3.58 | 982 | 501 |
|
| .344 | 4.47 | 1.47 | 1,282 | 935 |
|
| .049 | 1.10 | 0.23 | 413 | 339 |
|
| .496 | 8.45 | 2.91 | 1,388 | 918 |
|
| .397 | 4.95 | 1.88 | 1,986 | 1,629 |
|
| .126 | NA | NA | 400 | 16 |
|
| .589 | 1.98 | 0.56 | 886 | 401 |
|
| .004 | NA | NA | 394 | 9 |
|
| .596 | 7.66 | 1.96 | 600 | 193 |
|
| .841 | 5.99 | 1.84 | 584 | 209 |
|
| .462 | 6.50 | 2.54 | 2,303 | 1,867 |
|
| .644 | 1.38 | 0.28 | 445 | 74 |
|
| .081 | 0.16 | 0.03 | 392 | 55 |
|
| .055 | NA | NA | 392 | 3 |
|
| .372 | 2.21 | 0.49 | 518 | 86 |
|
| .442 | 1.04 | 0.14 | 461 | 56 |
|
| .037 | NA | NA | 394 | 22 |
|
| .013 | NA | NA | 393 | 27 |