| Literature DB >> 26918331 |
Christopher J Paciorek1, Simon J Goring2, Andrew L Thurman3, Charles V Cogbill4, John W Williams2,5, David J Mladenoff6, Jody A Peters7, Jun Zhu8, Jason S McLachlan7.
Abstract
We present a gridded 8 km-resolution data product of the estimated composition of tree taxa at the time of Euro-American settlement of the northeastern United States and the statistical methodology used to produce the product from trees recorded by land surveyors. Composition is defined as the proportion of stems larger than approximately 20 cm diameter at breast height for 22 tree taxa, generally at the genus level. The data come from settlement-era public survey records that are transcribed and then aggregated spatially, giving count data. The domain is divided into two regions, eastern (Maine to Ohio) and midwestern (Indiana to Minnesota). Public Land Survey point data in the midwestern region (ca. 0.8-km resolution) are aggregated to a regular 8 km grid, while data in the eastern region, from Town Proprietor Surveys, are aggregated at the township level in irregularly-shaped local administrative units. The product is based on a Bayesian statistical model fit to the count data that estimates composition on the 8 km grid across the entire domain. The statistical model is designed to handle data from both the regular grid and the irregularly-shaped townships and allows us to estimate composition at locations with no data and to smooth over noise caused by limited counts in locations with data. Critically, the model also allows us to quantify uncertainty in our composition estimates, making the product suitable for applications employing data assimilation. We expect this data product to be useful for understanding the state of vegetation in the northeastern United States prior to large-scale Euro-American settlement. In addition to specific regional questions, the data product can also serve as a baseline against which to investigate how forests and ecosystems change after intensive settlement. The data product is being made available at the NIS data portal as version 1.0.Entities:
Mesh:
Year: 2016 PMID: 26918331 PMCID: PMC4768886 DOI: 10.1371/journal.pone.0150087
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Spatial domain of the northeastern United States, with locations with data shown in gray.
Locations are grid cells in midwestern portion and townships in eastern portion. In addition to locations without data being indicated in white, grid cells completely covered in water are white (e.g., a few locations in the northwestern portion of the domain in the states of Minnesota and Wisconsin).
Predictive ability based on several predictive metric criteria for the CAR and SPDE spatial models when holding out 95% of entire cells of data in Minnesota.
| Posterior mean of metric | Metric of posterior mean predictions | ||||
|---|---|---|---|---|---|
| CAR model | SPDE model | Posterior Prob. CAR <SPDE | CAR model | SPDE model | |
| Brier | 0.819 | 0.844 | 0.98 | 0.738 | 0.733 |
| Negative Log Density | 466325 | 510383 | 1.00 | 394003 | 394554 |
| Mean Absolute Error | 0.0364 | 0.0383 | 0.98 | 0.0275 | 0.0269 |
| Root Mean Square Error | 0.0897 | 0.0960 | 0.97 | 0.0647 | 0.0627 |
Smaller values are better for all metrics.
Coverage and length of prediction intervals for the CAR and SPDE spatial models when holding out 95% of entire cells of data in Minnesota.
| CAR model | SPDE model | |
|---|---|---|
| Coverage | 0.977 | 0.978 |
| Mean Interval Length | 0.129 | 0.142 |
| Median Interval Length | 0.037 | 0.033 |
Coverage values near 0.95 are optimal, while shorter intervals are better.
Predictive ability based on several predictive metric criteria for the CAR and SPDE spatial models when holding out 80% of entire cells of data in Minnesota.
| Posterior mean of score | Score of posterior mean predictions | ||||
|---|---|---|---|---|---|
| CAR model | SPDE model | Posterior Prob. CAR <SPDE | CAR model | SPDE model | |
| Brier | 0.773 | 0.765 | 0.10 | 0.710 | 0.710 |
| Negative Log Density | 355928 | 353987 | 0.25 | 311525 | 311902 |
| Mean Absolute Error | 0.0309 | 0.0296 | 0.10 | 0.0226 | 0.0223 |
| Root Mean Square Error | 0.0763 | 0.0739 | 0.02 | 0.0533 | 0.0530 |
Smaller values are better for all metrics.
Coverage and length of prediction intervals for the CAR and SPDE spatial models when holding out 80% of entire cells of data in Minnesota.
| CAR model | SPDE model | |
|---|---|---|
| Coverage | 0.981 | 0.972 |
| Mean Interval Length | 0.112 | 0.103 |
| Median Interval Length | 0.028 | 0.022 |
Coverage values near 0.95 are optimal, while shorter intervals are better.
Predictive ability based on several predictive metric criteria for the CAR and SPDE spatial models when holding out 5% of trees.
| Posterior mean of metric | Metric of posterior mean predictions | ||||
|---|---|---|---|---|---|
| CAR model | SPDE model | Posterior Prob. CAR <SPDE | CAR model | SPDE model | |
| Brier | 0.662 | 0.661 | 0.07 | 0.657 | 0.657 |
| Negative Log Density | 51757 | 51626 | 0.01 | 50705 | 50736 |
Smaller values are better for all metrics.
Fig 2Raw data, predictions, and uncertainty for select taxa.
Empirical proportions from raw data (column 1), predictions in the form of posterior means (column 2) and uncertainty estimates in the form of posterior standard deviations—representing standard errors of prediction (column 3). In raw data plots, white indicates no data.
Fig 3Predictions (posterior means) for all taxa over the entire domain.