| Literature DB >> 31676769 |
J Sakari Salonen1,2, Mikko Korpela3, John W Williams4, Miska Luoto3.
Abstract
We test several quantitative algorithms as palaeoclimate reconstruction tools for North American and European fossil pollen data, using both classical methods and newer machine-learning approaches based on regression tree ensembles and artificial neural networks. We focus on the reconstruction of secondary climate variables (here, January temperature and annual water balance), as their comparatively small ecological influence compared to the primary variable (July temperature) presents special challenges to palaeo-reconstructions. We test the pollen-climate models using a novel and comprehensive cross-validation approach, running a series of h-block cross-validations using h values of 100-1500 km. Our study illustrates major benefits of this variable h-block cross-validation scheme, as the effect of spatial autocorrelation is minimized, while the cross-validations with increasing h values can reveal instabilities in the calibration model and approximate challenges faced in palaeo-reconstructions with poor modern analogues. We achieve well-performing calibration models for both primary and secondary climate variables, with boosted regression trees providing the overall most robust performance, while the palaeoclimate reconstructions from fossil datasets show major independent features for the primary and secondary variables. Our results suggest that with careful variable selection and consideration of ecological processes, robust reconstruction of both primary and secondary climate variables is possible.Entities:
Year: 2019 PMID: 31676769 PMCID: PMC6825136 DOI: 10.1038/s41598-019-52293-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Modern and fossil pollen datasets in North America and Europe.
Fossil pollen datasets used for the study.
| Site | # Samples | Time range (cal. ka) |
|---|---|---|
| Deep Lake, Minnesota[ | 62 | 0.2–11.2 |
| Moon Lake, North Dakota[ | 170 | 0–14.0 |
| Laihalampi, Finland[ | 150 | 0–11.0 |
| Sokli, Finland[ | 217 | 117.4–130.3 |
Modelling tools used and their parameterization.
| Code | Method | Parameters |
|---|---|---|
| MAT | Modern analogue technique | Weighted mean of 5 closest analogues |
| WA | Weighted averaging | Monotonic deshrinking, tolerance down-weighting, square-root transformation of species data |
| WAPLS | Weighted averaging-partial least squares | 3-component models, square-root transformation of species data |
| RF | Random forest | 100 trees |
| ETREES | Extremely Randomized Trees | Number of random cuts = 5 |
| BRT | Boosted regression trees | Maximum number of trees = 3000, learning rate = 0.025, tree complexity = 4, bagging fraction = 0.5 |
| NNET | Artificial neural network | Linear output units. Number of units in the hidden layer = 18 (Europe, |
| ELM | Extreme Learning Machine | Prediction with a mean of 5 networks. Positive linear activation function. Number of units in the hidden layer = 180 (Europe, |
Figure 2Spatial distribution and correlation of reconstructed variables. The maps show the modern values for (a) July mean air temperature (Tjul), (b) January mean air temperature (Tjan), and (c) Annual water balance. Panel (d) shows observed values and the Spearman correlation (ρ) for the reconstructed climate variables for the European and North American pollen–climate calibration datasets.
Figure 3Cross-validation (CV) results. Results are shown with a series of h-block CV’s, with h increasing from 0 to 1500 km at 100 km increments (note: h = 0 is equivalent to leave-one-out CV). Orange bars indicate the best h, estimated based on the range of a variogram fitted to the residuals of a weighted averaging (WA) model. (a) Root-mean-square error of prediction (RMSEP) for the primary variable in North America and Europe (July mean temperature). (b) RMSEP for the secondary variable (water balance in North America, January mean temperature in Europe). (c) Loss of calibration data in CV models, shown for individual sites and as the median for all sites. (d) Compositional distance (squared chord distance) to best pollen analogue in CV models, shown for individual sites and as the median for all sites.
Cross-validated performance metrics for the individual pollen–climate calibration models.
| Rank | North America, | North America, Water balance (mm) | Europe, | Europe, | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | RMSEP | Max.bias | Method | RMSEP | Max.bias | Method | RMSEP | Max.bias | Method | RMSEP | Max.bias | |
| 1 | ETREES | 1.73 | 6.87 | BRT | 161.55 | 529.50 | BRT | 1.61 | 5.59 | BRT | 2.92 | 7.16 |
| 2 | BRT | 1.75 | 6.91 | RF | 161.92 | 570.55 | RF | 1.61 | 7.75 | ETREES | 3.08 | 9.36 |
| 3 | RF | 1.84 | 7.92 | ETREES | 162.14 | 557.45 | ETREES | 1.63 | 7.81 | RF | 3.19 | 9.66 |
| 4 | MAT | 1.89 | 5.14 | WAPLS | 171.97 | 595.71 | WA | 1.72 | 7.47 | WAPLS | 3.28 | 10.29 |
| 5 | WAPLS | 1.96 | 7.88 | MAT | 190.71 | 574.10 | WAPLS | 1.73 | 5.90 | ELM | 3.37 | 10.00 |
| 6 | WA | 2.18 | 8.71 | ELM | 194.05 | 612.41 | MAT | 1.82 | 8.78 | WA | 3.46 | 13.48 |
| 7 | NNET | 2.29 | 7.54 | WA | 216.10 | 676.09 | ELM | 1.98 | 6.35 | MAT | 3.47 | 6.73 |
| 8 | ELM | 2.37 | 6.83 | NNET | 231.63 | 660.83 | NNET | 2.15 | 7.77 | NNET | 4.03 | 7.99 |
Results are shown for eight models and for primary and secondary climate variables in each calibration dataset (North America and Europe), using h-block cross-validation with h determined by the variogram-range method. The metrics shown for each model are root-mean-square error of prediction (RMSEP) and maximum (Max.) bias. Models are ranked based on increasing RMSEP.
Figure 4Palaeoclimate reconstructions. Reconstructions are shown for primary and secondary climate variables and prepared with eight calibration methods from each fossil dataset. The black dashed lines indicate the modern climate values at the fossil sites. The SiZer maps (lower panels) show the significant features of the reconstructions, based on the curve using the calibration method with the strongest CV performance for the climate variable in question (BRT or ETREES; Table 3). The reconstruction is smoothed at different bandwidths, with the bandwidth used at each point on the vertical axis indicated by the horizontal distance between the white lines. For each point in time and each bandwidth (h), red indicates a significant rising trend, blue a significant falling trend, purple a lack of a significant trend, and grey a lack of sufficient data for meaningful inference.
Relative contribution of the ten most important predictor taxa for the boosted regression tree pollen–climate models for July mean temperature (Tjul), January mean temperature (Tjan), and water balance.
| Rank | North America, | North America, Water balance | Europe, | Europe, | ||||
|---|---|---|---|---|---|---|---|---|
| Taxon | % | Taxon | % | Taxon | % | Taxon | % | |
| 1 |
| 53.5 |
| 29.7 |
| 20.3 |
| 17.3 |
| 2 | Cyperaceae | 9.9 |
| 12.6 | Cyperaceae | 7.9 |
| 13.4 |
| 3 |
| 6.0 |
| 8.8 |
| 6.9 |
| 12.3 |
| 4 |
| 4.2 | Chenopodiaceae | 4.5 |
| 6.0 |
| 11.9 |
| 5 | Chenopodiaceae | 4.0 |
| 4.5 |
| 5.3 | Poaceae | 8.2 |
| 6 |
| 3.5 | Lycopodiaceae | 4.5 |
| 5.1 |
| 3.7 |
| 7 |
| 2.4 |
| 3.9 | Polypodiaceae | 4.2 | 3.5 | |
| 8 |
| 1.7 |
| 3.7 |
| 3.8 | Ericaceae | 2.9 |
| 9 |
| 1.7 |
| 2.6 | Ericaceae | 3.3 | Polypodiaceae | 2.2 |
| 10 | Ericaceae | 1.6 |
| 2.3 | Chenopodiaceae | 3.2 |
| 2.0 |