| Literature DB >> 32210980 |
Maxime Bombrun1, Jonathan P Dash1, David Pont1, Michael S Watt1, Grant D Pearse1, Heidi S Dungey2.
Abstract
Advances in remote sensing combined with the emergence of sophisticated methods for large-scale data analytics from the field of data science provide new methods to model complex interactions in biological systems. Using a data-driven philosophy, insights from experts are used to corroborate the results generated through analytical models instead of leading the model design. Following such an approach, this study outlines the development and implementation of a whole-of-forest phenotyping system that incorporates spatial estimates of productivity across a large plantation forest. In large-scale plantation forestry, improving the productivity and consistency of future forests is an important but challenging goal due to the multiple interactions between biotic and abiotic factors, the long breeding cycle, and the high variability of growing conditions. Forest phenotypic expression is highly affected by the interaction of environmental conditions and forest management but the understanding of this complex dynamics is incomplete. In this study, we collected an extensive set of 2.7 million observations composed of 62 variables describing climate, forest management, tree genetics, and fine-scale terrain information extracted from environmental surfaces, management records, and remotely sensed data. Using three machine learning methods, we compared models of forest productivity and evaluate the gain and Shapley values for interpreting the influence of categorical variables on the power of these methods to predict forest productivity at a landscape level. The most accurate model identified that the most important drivers of productivity were, in order of importance, genetics, environmental conditions, leaf area index, topology, and soil properties, thus describing the complex interactions of the forest. This approach demonstrates that new methods in remote sensing and data science enable powerful, landscape-level understanding of forest productivity. The phenotyping method developed here can be used to identify superior and inferior genotypes and estimate a productivity index for individual site. This approach can improve tree breeding and deployment of the right genetics to the right site in order to increase the overall productivity across planted forests.Entities:
Keywords: GPU-acceleration; LIDAR. forestry; decision trees; gradient boosting; phenotyping
Year: 2020 PMID: 32210980 PMCID: PMC7068454 DOI: 10.3389/fpls.2020.00099
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1The study forest location within New Zealand and the extent of the ALS data outlined in red.
Silviculture variables of the dataset.
| GxExS trend | Feature name | Description |
|---|---|---|
| Silviculture | StandArea | Area of the stand. |
| Silviculture | Crop.Init.SPH | Stand density at which seedlings were established. |
| Silviculture | Rotation | The number of successive replantings that occurred on this stand. |
| Silviculture | ThinClass | Categorical variable representative of the thinning management regime such as number of thinning or crown release. |
| Silviculture | ThinType | Categorical variable representative of the management method for thinning such as production thinning, waste thinning and waste thinning with low pruning. |
| Silviculture | ResidSPH | Residual from the inventory of the stand per hectare after harvesting. |
| Silviculture | PruneClass | Categorical variable representative of the pruning (removal of lower branches) management regime. |
| Silviculture | PruneSPH | The stand density of pruned stems. |
| Silviculture | PruneHt | Height (in m) to which the tree has been pruned. |
| Silviculture | MaxDOS | Refers to the maximum diameter of the stem at the point of pruning, including branch stubs. |
| Silviculture | LastSPH | The stand density taken during the last inventory of the stand prior to harvesting. |
| Silviculture | ThinDate | Date of the last thinning. |
| Silviculture | PruneDate | Date of the last pruning. |
| Silviculture | Seedlot.Planting.Stock | Categorical variable representative of the type of management in the nursery such as control and open pollination, clonal cuttings and plantlets or seedling top cutting. |
| Silviculture | Seedlot.Planting.Stock.Type | Categorical variable representative of the type of stock, this being container, bareroot or plug seedlings. |
Genetic variables of the dataset.
| GxExS trend | Feature name | Description |
|---|---|---|
| Genetic | SeedlotCod | Categorical variable representative of the seed family. |
| Genetic | Clone | Categorical variable of two classes encoding the condition “the tree is a clone”. |
| Genetic | GF | Growth and form score. |
Environment variables of the dataset.
| GxExS trend | Feature name | Description |
|---|---|---|
| Environment | SiteIndex | In New Zealand, mean top height at age 20 years derived from Eq. 1. |
| Environment | Temp2 | Mean temperature in degree Celsius per day. |
| Environment | glob.rad | Amount of accumulated global solar radiation in MJ/m2. |
| Environment | sun.hrs2 | Number of hours of sun per day. |
| Environment | tot.rain2 | Total amount of rain in mm per day. |
| Environment | windspeed2 | Mean wind speed in m/s at 10m above ground level over 24 hours. |
| Environment | Aspect3 | Local morphometric terrain parameters derived from multi-scale fitting based on ( |
| Environment | dtm_elev3 | Elevation of the terrain in metres calculated from the digital terrain model. |
| Environment | mid.slope.position3 | Mid-slope position is assigned with 0 whereas maximum vertical distances to the mid-slope in both valley or crest directions are assigned with 1. |
| Environment | normalised.height3 | Normalised height allots value 1 to the highest and value 0 to the lowest position within a respective reference area ( |
| Environment | sky.view.factor3 | Calculation of visible sky based on ( |
| Environment | slope.height3 | Difference in altitude between the pixel and the local channel. |
| Environment | slope3 | Local morphometric terrain parameters derived from multi-scale fitting based on ( |
| Environment | standardised.height3 | Product of normalised height multiplied with absolute height. |
| Environment | Terrain | Automated classification of topography calculated from the DTM. Based on the algorithm presented in ( |
| Environment | valley.depth | Valley depth calculated using the Top Hat algorithm presented in ( |
| Environment | vector.ruggedness | Vector ruggedness calculated following the algorithm presented in ( |
| Environment | visible.sky | Estimate of visible sky calculated using the Top Hat algorithm presented in ( |
| Environment | wetness.index | Topographic wetness index calculated using ( |
| Environment | wind.exp | Topographic assessment of wind exposure calculated based on the algorithm presented in ( |
| Environment | cn_rk5 | Carbon to nitrogen ratio representative of the fertility. |
| Environment | LAI | Leaf area index, a dimensionless quantity that is defined as the one-sided leaf area per unit ground area. Derived using the methods defined in ( |
| Environment | soil_final | Categorical variable representative of the soil composure following the New Zealand Soil Classification ( |
2These features are split into 5 variables representing annual, summer, autumn, winter, and spring averages over 30 years.
3Feature extracted from SAGA GIS (http://www.saga-gis.org/en/index.html) based on the DTM.
Time and score comparisons for the training of the three GB methods on the training set.
| XGBoost | CatBoost | LightGBM | |
|---|---|---|---|
| max_depth: 4 | depth: 5 | max_depth: 4 | |
| learning_rate: 0.05 | learning_rate: 0.1 | learning_rate: 0.1 | |
| Hyperparameters | min_child_weight: 3 | l2_leaf_reg: 5 | num_leaves: 10 |
| n_estimators: 6,877 | iterations: 10,369 | n_estimators: 6,571 | |
| one_hot_max_size: 10 | |||
| Training Time | 119 sec | 518 sec | 406 sec |
| RMSE | 1.6968 | 1.6385 | 1.6736 |
| 0.86 | 0.87 | 0.87 |
Time and score comparisons for the validation of the three GB methods on the testing set.
| XGBoost | CatBoost | LightGBM | |
|---|---|---|---|
| Prediction Time | 0.76 sec | 8 sec | 19 sec |
| RMSE | 1.7595 | 1.7137 | 1.7344 |
| 0.85 | 0.86 | 0.88 |
Figure 2(A) XGBoost gain: number of observations affected by the splits based on a feature. (B) LightGBM gain: total sum of gain on prediction from the splits based on a feature (C) CatBoost gain: average gain on prediction from the splits based on a feature.
Figure 3Impact of variables for XG Boost (A), Light GBM (B), and Cat Boost (C). Every observation has one dot in each row. The position of the dot on the x-axis is the impact of that feature on the model's prediction for the observation, and the colour of the dot represents the value of that feature for the observation.
Figure 4(A) Original raster layer of the Site Index across the study forest with missing values. (B) Predicted raster layer of the Site Index based on the CatBoost model.
Figure 5Interaction of GxE. (A) Spring temperature across the forest showing an increase from south to north (orange: low to red: high), (B) spatial variation in Site Index for Seedlot 104 and (C) Seedlot 207. Also shown is (D) the relationship between spring air temperature and Site Index for Seedlots 104 (blue), 207 (orange) and 274 (green).