| Literature DB >> 35386677 |
Zi-Heng Feng1,2, Lu-Yuan Wang1, Zhe-Qing Yang1, Yan-Yan Zhang1, Xiao Li3, Li Song1, Li He1, Jian-Zhao Duan1, Wei Feng1.
Abstract
Powdery mildew has a negative impact on wheat growth and restricts yield formation. Therefore, accurate monitoring of the disease is of great significance for the prevention and control of powdery mildew to protect world food security. The canopy spectral reflectance was obtained using a ground feature hyperspectrometer during the flowering and filling periods of wheat, and then the Savitzky-Golay method was used to smooth the measured spectral data, and as original reflectivity (OR). Firstly, the OR was spectrally transformed using the mean centralization (MC), multivariate scattering correction (MSC), and standard normal variate transform (SNV) methods. Secondly, the feature bands of above four transformed spectral data were extracted through a combination of the Competitive Adaptive Reweighted Sampling (CARS) and Successive Projections Algorithm (SPA) algorithms. Finally, partial least square regression (PLSR), support vector regression (SVR), and random forest regression (RFR) were used to construct an optimal monitoring model for wheat powdery mildew disease index (mean disease index, mDI). The results showed that after Pearson correlation, two-band optimization combinations and machine learning method modeling comparisons, the comprehensive performance of the MC spectrum data was the best, and it was a better method for pretreating disease spectrum data. The transformed spectral data combined with the CARS-SPA algorithm was able to extract the characteristic bands more effectively. The number of bands screened was more than the number of bands extracted by the OR data, and the band positions were more evenly distributed. In comparison of different machine learning modeling methods, the RFR model performed the best (coefficient of determination, R 2 = 0.741-0.852), while the SVR and PLSR models performed similarly (R 2 = 0.733-0.836). Taken together, the estimation accuracy of spectral data transformation using the MC method combined with the RFR model (MC-RFR) was the highest, the model R 2 was 0.849-0.852, and the root mean square error (RMSE) and the mean absolute error (MAE) ranged from 2.084 to 2.177 and 1.684 to 1.777, respectively. Compared with the OR combined with the RFR model (OR-RFR), the R 2 increased by 14.39%, and the R 2 of RMSE and MAE decreased by 23.9 and 27.87%. Also, the monitoring accuracy of flowering stage is better than that of grain filling stage, which is due to the relative stability of canopy structure in flowering stage. It can be seen that without changing the shape of the spectral curve, and that the use of MC to preprocess spectral data, the use of CARS and SPA algorithms to extract characteristic bands, and the use of RFR modeling methods to enhance the synergy between multiple variables, and the established model (MC-CARS-SPA-RFR) can better extract the covariant relationship between the canopy spectrum and the disease, thereby improving the monitoring accuracy of wheat powdery mildew. The research results of this study provide ideas and methods for realizing high-precision remote sensing monitoring of crop disease status.Entities:
Keywords: feature band selection; machine learning; remote sensing monitoring; spectral transformation; wheat powdery mildew
Year: 2022 PMID: 35386677 PMCID: PMC8977770 DOI: 10.3389/fpls.2022.828454
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1The overall technical workflow of the study.
Figure 2Changes in spectral reflectance of the wheat canopy and correlation with the powdery mildew disease index (A,B, Original reflectance of different disease severities in the high-sensitivity and medium-sensitivity types; C, spectral reflectance from the four different pretreatment methods; and D, correlation coefficients between spectral data and disease indexes).
Figure 3Determination coefficients for optimal two-band combinations with the different spectral variation methods [A–C for original reflectivity (OR); D–F for mean centralization (MC); G–I for multivariate scattering correction (MSC); and J–L for standard normal variate transform (SNV)].
Optimized forms of optimal two-band combinations and monitoring performance.
| Spectrum transform | Normalized difference vegetation index (ND) | Simple ratio vegetation index (SR) | Simple difference vegetation index (SD) | |||
|---|---|---|---|---|---|---|
| Bands |
| Bands |
| Bands |
| |
| OR | 1,000,485 | 0.401 | 668,1,000 | 0.417 | 750,749 | 0.378 |
| MC | 431,430 | 0.480 | 430,432 | 0.480 | 750,749 | 0.378 |
| MSC | 786,785 | 0.409 | 786,785 | 0.409 | 786,785 | 0.447 |
| SNV | 431,430 | 0.480 | 430,432 | 0.480 | 431,430 | 0.449 |
Figure 4The process of band selection by the Competitive Adaptive Reweighted Sampling (CARS) algorithm (A) and the bands selected by the CARS algorithm (B).
Figure 5Variation in the root mean square error (RMSE) in the Successive Projections Algorithm (SPA; A) and the optimal bands chosen by the SPA algorithm (B).
Figure 6Location and number of feature bands selected by the eight models.
Results of feature band selection for four different spectral transformation methods.
| Spectrum transform | Optimum band selection using SPA on the base of CARS |
|---|---|
| OR | 400,464,530,575,681,728,758,956 |
| MC | 415,467,530,560,680,719,756,789,812,849,872,942 |
| MSC | 414,471,587,662,727,757,794,811,867,873,881,950 |
| SNV | 419,473,514,530,623,712,729,768,835,872,942 |
Specific metrics for machine learning models based on different spectral transformation methods.
| Spectrum transform | Number of variables | Modeling method | Calibration set | Validation set | ||||
|---|---|---|---|---|---|---|---|---|
|
| RMSE | MAE |
| RMSE | MAE | |||
| OR | 8 | PLSR | 0.744 | 2.865 | 2.325 | 0.733 | 2.934 | 2.675 |
| SVR | 0.741 | 2.913 | 2.463 | 0.737 | 2.912 | 2.651 | ||
| RFR | 0.746 | 2.728 | 2.215 | 0.741 | 2.872 | 2.604 | ||
| MSC | 12 | PLSR | 0.783 | 2.365 | 1.941 | 0.786 | 2.424 | 2.084 |
| SVR | 0.779 | 2.354 | 1.964 | 0.773 | 2.456 | 2.054 | ||
| RFR | 0.791 | 2.331 | 1.912 | 0.799 | 2.304 | 2.073 | ||
| SNV | 11 | PLSR | 0.823 | 2.217 | 1.898 | 0.813 | 2.282 | 1.929 |
| SVR | 0.824 | 2.211 | 1.865 | 0.818 | 2.254 | 1.945 | ||
| RFR | 0.832 | 2.202 | 1.835 | 0.828 | 2.252 | 1.924 | ||
| MC | 12 | PLSR | 0.835 | 2.173 | 1.802 | 0.828 | 2.268 | 1.822 |
| SVR | 0.836 | 2.180 | 1.817 | 0.835 | 2.193 | 1.788 | ||
| RFR | 0.852 | 2.084 | 1.684 | 0.849 | 2.177 | 1.777 | ||
Figure 7Performance of spectral transformation data for the different models.
Figure 8Comparison between the different models using the MC pre-processing method (A–C are modeling and testing of the model; D–F are the performance of different growth stages in the modeling set, and G–I are the performance of high-sensitivity and medium-sensitivity types in the modeling set).
Figure 9Feature band weights for the random forest regression (RFR) model based on four different spectral data transformation methods.