| Literature DB >> 35990114 |
Juanjuan Zhu1, Xiu Jin1,2, Shaowen Li1,2, Yalu Han1, Wenrui Zheng1.
Abstract
The trace element boron (Boron, B) is an important factor in crops' development, pollination, and fertilization. Available boron (AB) in soil is the main source of boron nutrient absorption for crops. Rapid detection of AB is of great significance for crop nutrition diagnosis, soil testing and fertilization, precision agriculture development, scientific production management, and guarantee of stable yield and high quality. In this study, we propose a new method to predict soil available boron content using handheld nonimaging hyperspectroscopy in the visible-near-infrared range (350-1655 nm). As boron content is one of the fewest soil chemical elements, a rapid and accurate method has yet to be developed to detect and quantify the soil available boron. Visible-near-infrared ray (VIS-NIR) spectroscopy is widely utilized in the detection and quantification of soil available nutrients. There is, however, scant research on the detection of soil boron based on NIR data, and the performance of current regression model is still far from satisfactory. Our soil samples were collected from southern Anhui, China, with their NIR spectroscopy examined and the NIR data pretreated by 29 transformations and modeled with 10 regression algorithms. Of all the tested methods, SVM_RBF, BPNN, and PLS_RBF algorithms demonstrated the best performance and gave 0.80∼0.82 coefficient of determination value. At the same time, Random Forest algorithm (RFA), Successive Projection Algorithm (SPA), and Variable Importance in Projection (VIP) were used to extract the spectral characteristic wavelength data of soil available boron, and then the characteristic wavelength data were modeled with three regression algorithms: SVM_RBF, PLS_RBF, and BPNN. A comparative analysis of the prediction performance (R 2, RPD, RMSE, and RPIQ) of the models established at the full band showed that the RFA-MSC/BPNN model achieved the best performance. Compared with the best full-wavelength model DT/SVM_RBF, the test set achieved a 3.06% increase in R 2, a 7.12% drop in RMSE, a 7.71% gain in RPD, and a 7.78% increase in RPIQ. Our work sheds lights on how to achieve rapid quantification of the soil available boron concentration.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35990114 PMCID: PMC9388244 DOI: 10.1155/2022/9748257
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The sampling areas for soil collection in southern Anhui, China.
Figure 2The laboratory visible-near-infrared spectroscopy acquisition system.
Pretreatment methods utilized for the visible-near-infrared ray spectroscopy of collected soil samples.
| Pretreatment methods | Abbreviations |
|---|---|
| Reflection spectrum without pretreatment method | RS |
| Dislodge tendency | DT |
| First derivative | FD |
| Second derivative | SD |
| Mean center | MC |
| Logarithmic transformation | LG |
| First derivative with logarithmic transformation | LG + FD |
| Second derivative with logarithmic transformation | LG + SD |
| Multiplicative scatter correction | MSC |
| First derivative with multiplicative scatter correction | MSC + FD |
| Second derivative with multiplicative scatter correction | MSC + SD |
| Standard normal variate | SNV |
| Dislodge tendency with standard normal variate | SNV + DT |
| First derivative with standard normal variate | SNV + FD |
| Second derivative with standard normal variate | SNV + SD |
| Savitzky–Golay | SG |
| Dislodge tendency with Savitzky–Golay | SG + DT |
| First derivative with Savitzky–Golay | SG + FD |
| Second derivative with Savitzky–Golay | SG + SD |
| Mean center with Savitzky–Golay | SG + MC |
| Logarithmic transformation with Savitzky–Golay | SG + LG |
| First derivative with logarithmic transformation and Savitzky–Golay | SG + LG + FD |
| Second derivative with logarithmic transformation and Savitzky–Golay | SG + LG + SD |
| Multiplicative scatter correction with Savitzky–Golay | SG + MSC |
| First derivative with multiplicative scatter correction and Savitzky–Golay | SG + MSC + FD |
| Second derivative with multiplicative scatter correction and Savitzky–Golay | SG + MSC + SD |
| Standard normal variate with Savitzky–Golay | SG + SNV |
| Dislodge tendency with standard normal variate and Savitzky–Golay | SG + SNV + DT |
| First derivative with standard normal variate and Savitzky–Golay | SG + SNV + FD |
| Second derivative with standard normal variate and Savitzky–Golay | SG + SNV + SD |
Figure 3BPNN model architecture.
BPNN modeling parameters.
| Network layer | Number of nodes | Number of parameters |
|---|---|---|
| 0 | 16 | 20912 |
| 1 | 8 | 136 |
| 2 | 4 | 36 |
| 3 | 1 | 5 |
The categories of different models based on RPD values.
| RPD | Level |
|---|---|
| RPD ≤ 1.4 | C |
| 1.4 < RPD ≤ 2.0 | B |
| RPD > 2.0 | A |
Soil available boron sample statistics.
| Type | Number | Max (mg·kg−1) | Min (mg·kg−1) | Average (mg·kg−1) | Standard deviation |
|---|---|---|---|---|---|
| Total | 188 | 3.91 | 0.24 | 0.87 | 0.86 |
| Train | 131 | 3.91 | 0.24 | 0.96 | 0.93 |
| Test | 57 | 3.65 | 0.28 | 0.68 | 0.66 |
Figure 4Average spectroscopy after various pretreatment transformations. (a) Average spectrum without Savitzky–Golay(SG) treatment; (b) average spectrum with SG method.
Figure 5The accuracy of regression models with test data set by all pretreatment transformations.
Level A RPD levels of regression models with various pretreatment transformations.
| Pretreatment methods | Regression algorithms |
|---|---|
| DT | Elastic net, BPNN, SVM_Linear, SVM_RBF, SVM_Sigmoid, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| LG + SD | PLS_Linear |
| LG | Ridge, BPNN, SVM_Linear, SVM_RBF, SVM_Sigmoid, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| MC | Ridge, BPNN, SVM_Linear, SVM_RBF, SVM_Sigmoid, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| MSC | BPNN, SVM_RBF, PLS_RBF, PLS_Sigmoid |
| RS | Ridge, SVM_Linear, SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SNV | SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SNV_DT | SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG + LG + FD | Ridge, SVM_RBF, SVM_Sigmoid, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG + LG | Elastic net, ridge, SVM_Linear, SVM_RBF, SVM_Sigmoid, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG | Lasso, ridge, BPNN, SVM_Linear, SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG_DT | Ridge, BPNN, SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG_FD | PLS_RBF and PLS_Sigmoid |
| SG_MC | Lasso, ridge, SVM_Linear, SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG_MSC | BPNN, SVM_RBF, PLS_RBF, and PLS_Sigmoid |
| SG_MSC_FD | PLS_Linear |
| SG_SNV | Elastic net, lasso, SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG_SNV_DT | SVM_RBF, PLS_Linear, PLS_RBF, and PLS_Sigmoid |
| SG_SNV_FD | PLS_RBF |
Figure 6Ratio of performance to interquartile distance (RPIQ) values of regression models with different pretreatment transformations.
Figure 7The statistics of RPD levels under different treatments.
The performance and parameters of the best models.
| Regression model | Pretreatment method | Test | Test RMSE | RPD level | RPIQ | Parameters |
|---|---|---|---|---|---|---|
| Elastic net | DT | 0.75 | 0.09 | A | 1.45 | Alpha = 2 |
| Lasso | SG | 0.72 | 0.12 | A | 1.30 | Alpha = 0.0001 |
| Ridge | LG | 0.77 | 0.08 | A | 1.56 | Alpha = 0.0005 |
|
|
|
|
|
|
|
|
| SVM_Linear | LG | 0.76 | 0.08 | A | 1.50 |
|
|
|
|
|
|
|
|
|
| SVM_Sigmoid | LG | 0.76 | 0.10 | A | 1.37 | Gammas = 5 |
| PLS_Linear | SG + LG | 0.78 | 0.07 | A | 1.66 |
|
|
|
|
|
|
|
|
|
| PLS_Sigmoid | SG_MC | 0.77 | 0.07 | A | 1.58 |
|
Note. Bold indicates that the prediction accuracy of the model is good.
Result of DT/SVR_RBF models.
| Variable selection method | No. of variables | Calibration sets | Test sets | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| RMSE | RPD | RPIQ |
| RMSE | RPD | RPIQ | ||
| Full wavelengths | 1306 | 0.989 | 0.008 | 9.490 | 5.138 | 0.821 | 0.042 | 3.287 | 2.155 |
| RFA | 18 | 0.988 | 0.095 | 9.034 | 4.937 | 0.594 | 0.563 | 1.582 | 0.710 |
| SPA | 2 | 0.749 | 0.428 | 2.003 | 1.094 | 0.735 | 0.534 | 1.861 | 0.786 |
| VIP | 32 | 0.988 | 0.095 | 9.017 | 4.928 | 0.666 | 0.510 | 1.745 | 0.783 |
Result of SG + SNV + DT/PLS_RBF models.
| Variable selection method | No. of variables | Calibration sets | Test sets | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| RMSE | RPD | RPIQ |
| RMSE | RPD | RPIQ | ||
| Full wavelengths | 1306 | 0.988 | 0.009 | 8.963 | 4.800 | 0.810 | 0.043 | 3.217 | 2.126 |
| RFA | 367 | 0.690 | 0.476 | 1.803 | 0.985 | 0.819 | 0.376 | 2.370 | 1.064 |
| SPA | 3 | 0.542 | 0.578 | 1.483 | 0.810 | 0.782 | 0.412 | 2.159 | 0.969 |
| VIP | 367 | 0.690 | 0.476 | 1.803 | 0.985 | 0.819 | 0.376 | 2.370 | 1.064 |
Result of MSC/BPNN models.
| Variable selection method | No. of variables | Calibration sets | Test sets | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
| RMSE | RPD | RPIQ |
| RMSE | RPD | RPIQ | ||
| Full wavelengths | 1306 | 0.764 | 0.415 | 2.066 | 1.129 | 0.816 | 0.379 | 2.349 | 1.054 |
|
|
|
|
|
|
|
|
|
|
|
| SPA | 24 | 0.733 | 0.441 | 1.944 | 1.063 | 0.736 | 0.453 | 1.964 | 0.882 |
|
|
|
|
|
|
|
|
|
|
|
Note. Bold indicates that the prediction accuracy of the model is good.
Figure 8Effects of different thresholds of variable importance in projection on models' performance.
Figure 9Comparison of predicted values and actual values of VIP-MSC/BPNN and RFA-MSC/BPNN models.
Characteristic wavelengths selected by RFA and VIP in MSC/BPNN modeling.
| Variable selection method | No. of variables | Characteristic wavelengths (nm) |
|---|---|---|
| RFA | 328 | 538, 456, 466, 510, 464, 548, 535, 525, 513, 458, 460, 529, 465, 905, 463, 519, 518, 527, 520, 526, 498, 522, 530, 515, 528, 1654, 1653, 461, 521, 1248, 524, 523, 508, 769, 455, 1367, 1655, 545, 517, 871, 1429, 509, 758, 825, 1428, 879, 547, 1432, 1215, 734, 475, 900, 495, 555, 536, 532, 1369, 895, 539, 747, 549, 1298, 552, 546, 540, 1376, 551, 1430, 593, 457, 859, 858, 1371, 1433, 813, 828, 925, 554, 1295, 756, 1427, 507, 733, 760, 862, 1294, 736, 543, 752, 909, 1375, 1368, 502, 762, 560, 1292, 904, 906, 558, 917, 499, 732, 1434, 1651, 1361, 902, 586, 869, 494, 1553, 504, 449, 972, 516, 888, 1378, 583, 469, 759, 503, 588, 533, 450, 896, 876, 563, 512, 908, 1297, 749, 459, 916, 541, 497, 913, 1550, 1296, 1370, 605, 584, 891, 531, 912, 1394, 1282, 462, 610, 1249, 570, 550, 907, 1406, 1396, 779, 578, 514, 557, 1379, 882, 511, 746, 860, 1372, 924, 569, 1374, 814, 829, 1307, 923, 1431, 1250, 500, 1548, 1397, 454, 534, 777, 1395, 1513, 1435, 1156, 1547, 1363, 1365, 1549, 362, 1321, 830, 1437, 897, 448, 585, 901, 764, 992, 792, 875, 809, 914, 915, 490, 1648, 435, 893, 964, 1647, 492, 377, 1384, 1343, 726, 1596, 741, 1318, 1649, 968, 1381, 1151, 1499, 967, 710, 1642, 892, 870, 778, 865, 739, 486, 955, 1645, 1597, 1290, 1602, 872, 971, 864, 445, 982, 1650, 472, 745, 1225, 911, 1546, 1511, 833, 899, 594, 470, 1205, 885, 974, 1239, 1323, 921, 918, 1617, 350, 919, 1505, 844, 1284, 1345, 1561, 748, 1300, 1209, 1552, 1545, 744, 849, 1612, 1634, 1226, 981, 1149, 654, 1256, 1551, 1402, 1508, 883, 451, 672, 1506, 1652, 774, 1637, 670, 831, 474, 1605, 763, 440, 878, 1353, 1216, 1644, 958, 773, 880, 447, 1299, 384, 1041, 650, 1537, 381, 1393, 1482, 1266, 704, 866, 423, 367, 1313, 785, 988, 1415, 653, 963, 1144, 943, 481, 1019, 1598, and 783. |
|
| ||
| VIP | 108 | 872, 1297, 1248, 1296, 860, 1298, 891, 875, 1295, 1403, 905, 859, 1404, 839, 861, 1402, 862, 1294, 883, 864, 847, 458, 1293, 1401, 871, 895, 741, 1644, 1645, 888, 1405, 853, 863, 1594, 825, 457, 1292, 1593, 1299, 460, 893, 1646, 461, 462, 1400, 1643, 455, 858, 1595, 769, 451, 464, 892, 909, 459, 1592, 1291, 456, 848, 813, 1647, 450, 449, 749, 908, 1290, 1399, 866, 844, 1591, 752, 833, 784, 1596, 851, 1289, 454, 822, 857, 964, 1648, 1551, 963, 947, 1642, 1552, 738, 1550, 1288, 840, 965, 771, 463, 1398, 465, 946, 760, 962, 1553, 1406, 843, 1287, 1590, 736, 466, 1649, 453, and 1549. |
Figure 10Distribution of characteristic wavelength points.