| Literature DB >> 31667128 |
Lei Zhang1, Falk Huettmann2, Xudong Zhang1, Shirong Liu3, Pengsen Sun3, Zhen Yu4, Chunrong Mi5.
Abstract
Random forests (RF) is a powerful species distribution model (SDM) algorithm. This ensemble model by default can produce categorical and numerical species distribution maps based on its classification tree (CT) and regression tree (RT) algorithms, respectively. The CT algorithm can also produce numerical predictions (class probability). Here, we present a detailed procedure involving the use of the CT and RT algorithms using the RF method with presence-only data to model the distribution of species. CT and RT are used to generate numerical prediction maps, and then numerical predictions are converted to binary predictions through objective threshold-setting methods. We also applied simple methods to deal with collinearity of predictor variables and spatial autocorrelation of species occurrence data. A geographically stratified sampling method was employed for generating pseudo-absences. The detailed procedural framework is meant to be a generic method to be applied to virtually any SDM prediction question using presence-only data. •How to use RF as a standard method for generic species distributions with presence-only data•How to choose RF (CT or RT) methods for the distribution modeling of species•A general and detailed procedure for any SDM prediction question.Entities:
Keywords: Binary prediction; Climate change; Forestation; Machine learning; Numerical prediction; Random forests models species distribution; Species traits; Threshold
Year: 2019 PMID: 31667128 PMCID: PMC6812352 DOI: 10.1016/j.mex.2019.09.035
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1General framework for species distribution modeling by random forests (classification tree (CT) and regression tree (RT) algorithms) and R functions used in this study. Adopted from Zhang et al. [9]; * recommended methods.
Biologically climatic variables.
| Code | Variable |
|---|---|
| BIO1 | Annual Mean Temperature |
| BIO2 | Mean Diurnal Range (Mean of monthly (max temp–min temp)) |
| BIO3 | Isothermality (BIO2/BIO7) |
| BIO4 | Temperature Seasonality |
| BIO5 | Max Temperature of Warmest Month |
| BIO6 | Min Temperature of Coldest Month |
| BIO7 | Temperature Annual Range (BIO5-BIO6) |
| BIO8 | Mean Temperature of Wettest Quarter |
| BIO9 | Mean Temperature of Driest Quarter |
| BIO10 | Mean Temperature of Warmest Quarter |
| BIO11 | Mean Temperature of Coldest Quarter |
| BIO12 | Annual Precipitation |
| BIO13 | Precipitation of Wettest Month |
| BIO14 | Precipitation of Driest Month |
| BIO15 | Precipitation Seasonality (Coefficient of Variation) |
| BIO16 | Precipitation of Wettest Quarter |
| BIO17 | Precipitation of Driest Quarter |
| BIO18 | Precipitation of Warmest Quarter |
| BIO19 | Precipitation of Coldest Quarter |
Fig. 2Generic guidelines on how to choose a random forests (classification or regression algorithm) method with presence-only data to model the distribution of species. Adopted from Zhang et al. [9].
Fig. 3Differences in model accuracy between random forests regression tree (RT) and classification tree (CT) algorithms used for prediction of the distribution of Quercus serrata. Dots show the mean value across all species. Different letters indicate significant differences according to a Wilcoxon signed-ranks test (P < 0.05).
Fig. 4Differences in prediction maps between random forests (RF) regression tree (RT) and classification tree (CT) algorithms for Quercus serrata. Numerical predictions were converted to binary predictions through objective threshold-setting methods (MaxTSS).
| Subject Area: | Agricultural and Biological Sciences |
| More specific subject area: | Species distribution modelling |
| Method name: | Random forests models species distribution |
| Name and reference of original method: | Zhang, L., Huettmann, F., Liu, S., Sun, P., Yu, Z., Zhang, X., Mi, C., 2019. Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species. Ecological Informatics, 52, 46–56. |
| Resource availability: | R software |