| Literature DB >> 34883798 |
Tao Hu1, Yuman Sun1, Weiwei Jia1, Dandan Li1, Maosheng Zou1, Mengku Zhang1.
Abstract
We performed a comparative analysis of the prediction accuracy of machine learning methods and ordinary Kriging (OK) hybrid methods for forest volume models based on multi-source remote sensing data combined with ground survey data. Taking Larix olgensis, Pinus koraiensis, and Pinus sylvestris plantations in Mengjiagang forest farms as the research object, based on the Chinese Academy of Forestry LiDAR, charge-coupled device, and hyperspectral (CAF-LiTCHy) integrated system, we extracted the visible vegetation index, texture features, terrain factors, and point cloud feature variables, respectively. Random forest (RF), support vector regression (SVR), and an artificial neural network (ANN) were used to estimate forest volume. In the small-scale space, the estimation of sample plot volume is influenced by the surrounding environment as well as the neighboring observed data. Based on the residuals of these three machine learning models, OK interpolation was applied to construct new hybrid forest volume estimation models called random forest Kriging (RFK), support vector machines for regression Kriging (SVRK), and artificial neural network Kriging (ANNK). The six estimation models of forest volume were tested using the leave-one-out (Loo) cross-validation method. The prediction accuracies of these six models are better, with RLoo2 values above 0.6, and the prediction accuracy values of the hybrid models are all improved to different extents. Among the six models, the RFK hybrid model had the best prediction effect, with an RLoo2 reaching 0.915. Therefore, the machine learning method based on multi-source remote sensing factors is useful for forest volume estimation; in particular, the hybrid model constructed by combining machine learning and the OK method greatly improved the accuracy of forest volume estimation, which, thus, provides a fast and effective method for the remote sensing inversion estimation of forest volume and facilitates the management of forest resources.Entities:
Keywords: artificial neural network (ANN); forest volume; multi-source remote sensing factor; ordinary Kriging (OK); random forest (RF); support vector regression (SVR)
Mesh:
Year: 2021 PMID: 34883798 PMCID: PMC8659858 DOI: 10.3390/s21237796
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Study area profile and multi-source dataset. The figure shows the specific location of the Mengjiagang forest farm (b) in China (a). The distribution of ground standard sample plots in forest compartments, the DOM used for extracting vegetation indices, and the DEM for extracting terrain factors are included in (b). (c) shows the LiDAR point cloud data of a ground standard plot.
The binary volume table parameters of main coniferous forest species in Northeast China [48].
| Tree Species | a | b | c |
|---|---|---|---|
|
| 0.00005017 | 1.7583 | 1.14967 |
|
| 0.00006353 | 1.9436 | 0.89689 |
|
| 0.00006938 | 1.7631 | 1.03701 |
Note: a, b, and c are the parameter estimates.
Extracted visible light vegetation index [22,23,49,50,51].
| Vegetation Index | Abbreviation | Calculation Formula |
|---|---|---|
| Normalized Green–Red Difference Index | NGRDI | (G − R)/(G + R) |
| Extreme Green Index | EXG | 2g − r − b |
| Color Index of Vegetation | CIVE | 0.44r − 0.88g + 0.39b + 18.79 |
| Vegetation Index | VEG | g/rab1−a, a = 0.67 |
| Excess Green Minus Excess Red Index | EXGR | EXG − 1.4r − g |
| Woebbecke Index | WI | (g − b)/(r − g) |
| Visible Band Different Vegetation Index | VDVI | (2G − R − B)/(2G + R + B) |
| Red–Green Ratio Index | RGRI | r/g |
| Normalized Green–Blue Difference Index | NGBDI | (G − B)/(G + B) |
| Green–Blue Ratio Index | GBRI | b/g |
| Green–Red and Blue Vegetation index | GBRVI | (G2 − B × R)/(G2 + B × R) |
| Modified Green and Red Vegetation Index | MGRVI | (G2 − R2)/(G2 + R2) |
| Differential Enhanced Vegetation Index | DEVI | G/3G + R/3G + B/3G |
| Green Leaf Index | GLI | (2g − r − b)/(2g + r + b) |
| Combination Index | COM | 0.25EXG + 0.3EXGR + 0.33CIVE + 0.12VEG |
| Combination Index 2 | COM2 | 0.36EXG + 0.47CIVE + 0.17VEG |
| Excess Red Index | EXR | 1.4 × r − g |
R: red light channel. G: green light channel. B: blue light channel. r: standardized results for the red light channel, r = R/(R + G + B). g: standardized results for the green light channel, g = G/(R + G + B). b: standardized results for blue light channels, b = B/(R + G + B).
Figure 2Distribution of altitude (a) and slope (b) in Mengjiagang forest area, extracted based on DEM data and GIS spatial analysis.
Extracted point cloud characteristic variables [30].
| Point Cloud Characteristic Variable | Description | |
|---|---|---|
| Point cloud height variable | H1, H5, H10, H20, H25, H30, H40, H50, H60, H70, H75, H80, H90, H95, H99 | Point cloud height percentile |
| Hmax, Hmin, Hmean, Hmed, Hstd, Hvar, Hmad | Maximum, minimum, average, median, standard deviation, variance, and mean absolute deviation of point cloud height | |
| Hskew, Hkurt, Hcrr, Hcv | Skewness, kurtosis, canopy fluctuation rate, and coefficient of variation of point cloud height | |
| Hd0, Hd1, Hd2, Hd3, Hd4, Hd5, Hd6, Hd7, Hd8, Hd9 | Point cloud height density variable | |
| Point cloud intensity variable | I1, I5, I10, I20, I25, I30, I40, I50, I60, I70, I75, I80, I90, I95, I99 | Point cloud intensity percentile |
| Imax, Imin, Imean, Imed, Istd, Ivar, Imad | Maximum, minimum, average, median, standard deviation, variance, mean absolute deviation of point cloud intensity | |
| Iskew, Ikurt, Icv | Skewness, kurtosis, and coefficient of variation of point cloud intensity | |
Figure 3Diagram of point cloud feature variables. The point cloud space is divided into different grids, according to certain distances in the x and y directions, and then further divided into different “layers”, according to the specified height (z) interval.
Figure 4Technology roadmap. Multi-source data are used to estimate forest volume, where blue is the processing method of the CCD image data source; green is the processing method of the LiDAR data source; yellow is the processing method of the ground standard sample data source; red represents the independent variables; white on the left represents the measured value; blue–green represents the estimation model.
Figure 5Schematic diagram of random forest regression. Through bootstrapping, a number of weak learners are trained by different decision regression trees, parameters, and features, and the final results are output by the weighted average method.
Figure 6Structure model of artificial neural network. represents input neurals, represents hidden neurals. represents output neurals. Shown are the input neuron , and each input to the hidden layer neuron is interconnected by selected weights. Then, the weighted output is combined and input into the output neuron to form the output value.
Figure 7Semi-variance function theoretical diagram. Nugget represents the variation caused by measurement or scale. Sill represents the sum of random variation and fixed variation. Partial Sill is the difference between Sill and Nugget. When the value of the semi-variance function is taken from the initial Nugget to the Sill, the interval distance of the sampling points is called the Range. The Sill effect (Nugget/Sill) is an important indicator of the degree of spatial autocorrelation; the smaller its value, the stronger the degree of spatial autocorrelation.
Figure 8Diagram of leave-one-out cross-validation. There are N samples; each sample is used as a test sample, and the other N-1 samples are used as training samples. This yields N classifiers and N test results, and the average of these N results is used to measure the performance of the model.
Figure 9Plot of correlation coefficients between the dependent variable (volume M) and the independent variable. Red represents positive correlation and blue represents negative correlation, and the smaller the ellipse and the darker the color, the higher the correlation between the two variables. The green box indicates that the column is the correlation between the dependent variable and each independent variable, and each ordinal number in the figure represents a variable. For example, 20 represents Hmed, and its correlation with the dependent variable M(1) is the ellipse corresponding to the number 20 in the green box.
Figure 10Model estimation evaluation indicators. The bar chart is a visual analysis of the data in Table 5; the three colors indicate the three indicators , , and . The models are analyzed by comparing the heights of the bars in the chart.
Parameters of ordinary Kriging models of residuals and their accuracy.
| Residual (m3/ha) | Model | Range (km) | Nugget | Partial Sill | Sill Effect (Nugget/Sill) |
| ||
|---|---|---|---|---|---|---|---|---|
| RRF | Spherical | 3.05 | 803.01 | 1772.39 | 0.31 | 43.9 | 46.3 | 0.25 |
| RSVR | Gaussian | 4.001 | 1136.22 | 4560.93 | 0.20 | 59.8 | 53.4 | 0.40 |
| RANN | Gaussian | 3.572 | 1975.30 | 4668.24 | 0.30 | 65.3 | 68.8 | 0.49 |
Estimation accuracy evaluation of machine learning models and ordinary Kriging model hybrid.
| Model |
|
|
|
| Level of Accuracy |
|---|---|---|---|---|---|
| RF | 40.8 | 52.3 | 0.90 | / | / |
| SVR | 57.2 | 75.1 | 0.80 | / | / |
| ANN | 69.1 | 93.5 | 0.68 | / | / |
| RFK | 37.4 | 46.3 | 0.92 | 0.02 | 11.47% |
| SVRK | 45.3 | 59.8 | 0.86 | 0.06 | 20.37% |
| ANNK | 53.1 | 68.8 | 0.82 | 0.14 | 26.42% |
Figure 11Importance of random forest characteristic variables. Using a python script, the importance analysis of all variables involved in the construction of the forest volume estimation model was performed in the random forest algorithm; each variable was scored in order of its contributory magnitude and (a) shows the score ranking of variables associated with the height of point cloud, while (b) shows the score ranking of other variables.
Figure 12Scatter plot of the estimated and measured values of the six models (=). The above six models were compared in order to analyze the accuracy of model predictions by establishing a linear relationship between measured and predicted volume values.