| Literature DB >> 35684314 |
Na Wang1, Jinrui Feng1, Longwei Li1, Jinming Liu1,2, Yong Sun3.
Abstract
The contents of cellulose and hemicellulose (C and H) in corn stover (CS) have an important influence on its biochemical transformation and utilization. To rapidly detect the C and H contents in CS by near-infrared spectroscopy (NIRS), the characteristic wavelength selection algorithms of backward partial least squares (BIPLS), competitive adaptive reweighted sampling (CARS), BIPLS combined with CARS, BIPLS combined with a genetic simulated annealing algorithm (GSA), and CARS combined with a GSA were used to select the wavelength variables (WVs) for C and H, and the corresponding regression correction models were established. The results showed that five wavelength selection algorithms could effectively eliminate irrelevant redundant WVs, and their modeling performance was significantly superior to that of the full spectrum. Through comparison and analysis, it was found that CARS combined with GSA had the best comprehensive performance; the predictive root mean squared errors of the C and H regression model were 0.786% and 0.893%, and the residual predictive deviations were 3.815 and 12.435, respectively. The wavelength selection algorithm could effectively improve the accuracy of the quantitative analysis of C and H contents in CS by NIRS, providing theoretical support for the research and development of related online detection equipment.Entities:
Keywords: backward partial least squares; cellulose and hemicellulose contents; competitive adaptive reweighted sampling; genetic simulated annealing algorithm; near-infrared spectroscopy
Mesh:
Substances:
Year: 2022 PMID: 35684314 PMCID: PMC9182057 DOI: 10.3390/molecules27113373
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1Near-infrared reflectance spectra of all samples: (a) raw spectra of all samples; (b) preprocessed spectra of all samples.
Figure 2The predicted residual mean and variance distribution map for cellulose (a) and hemicellulose (b).
Content distribution of sample set.
| Sample | Composition | Amount | Mean (%) | Max (%) | Min (%) | SD (%) |
|---|---|---|---|---|---|---|
| Cset | Cellulose | 88 | 44.247 | 51.527 | 36.067 | 3.758 |
| Hemicellulose | 88 | 23.760 | 38.541 | 9.484 | 9.828 | |
| Vset | Cellulose | 44 | 45.433 | 49.080 | 37.440 | 3.034 |
| Hemicellulose | 44 | 25.832 | 38.388 | 10.245 | 10.999 | |
| ITset | Cellulose | 46 | 43.813 | 49.757 | 36.031 | 3.163 |
| Hemicellulose | 46 | 25.123 | 38.592 | 9.948 | 9.554 |
Cset: calibration set; Vset: validation set; ITset: independent test set; SD: standard deviation.
Preliminary selection results of spectral characteristic intervals for cellulose and hemicellulose, optimized using BiPLS.
| Intervals | Cellulose | Hemicellulose | ||||
|---|---|---|---|---|---|---|
| Selected | RMSECV | Selected Wavelengths | Selected | RMSECV | Selected | |
| 61 | 15 | 0.697 | 456 | 17 | 1.021 | 512 |
| 46 | 14 | 0.681 | 563 | 13 | 0.995 | 520 |
| 36 | 12 | 0.714 | 617 | 17 | 0.896 | 870 |
| 26 | 9 | 0.757 | 638 | 8 | 0.957 | 568 |
| 18 | 11 | 0.719 | 1128 | 8 | 1.130 | 819 |
| 12 | 9 | 0.747 | 1384 | 6 | 1.143 | 921 |
Figure 3The relationship between RMSECV and WV and the number of selections. WV: wavelength variable.
Figure 4The characteristic wavelength variables selected by BIPLS-CARS optimized for cellulose (a) and hemicellulose (b).
Figure 5The characteristic wavelength variable distributions of cellulose (a) and hemicellulose (b) optimized by BIPLS, CARS200, BIPLS-CARS, BIPLS-GSA, and CARS-GSA.
The results for wavelength selection.
| Component | Model | NW 1 | LVs |
|
| RMSEC (%) | RMSEP (%) | RPD | MT 2 (m) | TT 3 (s) |
|---|---|---|---|---|---|---|---|---|---|---|
| Cellulose | Full-PLS | 1845 | 15 | 0.980 | 0.917 | 0.527 | 0.870 | 3.448 | 14.043 | 1.598 |
| BIPLS | 432 | 13 | 0.982 | 0.925 | 0.496 | 0.830 | 3.612 | 166.072 | 1.567 | |
| CARS200 | 241 | 16 | 0.994 | 0.920 | 0.284 | 0.861 | 3.482 | 264.298 | 1.459 | |
| BIPLS-CARS | 169 | 10 | 0.977 | 0.928 | 0.565 | 0.802 | 3.738 | 367.505 | 1.427 | |
| BIPLS-GSA | 241 | 11 | 0.979 | 0.927 | 0.541 | 0.801 | 3.747 | 1858.209 | 1.450 | |
| CARS-GSA | 200 | 8 | 0.971 | 0.930 | 0.628 | 0.786 | 3.815 | 1523.729 | 1.433 | |
| Hemicellulose | Full-PLS | 1845 | 18 | 0.998 | 0.990 | 0.383 | 1.033 | 10.529 | 15.358 | 1.638 |
| BIPLS | 306 | 13 | 0.995 | 0.993 | 0.643 | 0.927 | 11.982 | 99.427 | 1.543 | |
| CARS200 | 106 | 17 | 0.998 | 0.993 | 0.323 | 0.922 | 12.041 | 176.317 | 1.432 | |
| BIPLS-CARS | 115 | 12 | 0.996 | 0.993 | 0.629 | 0.912 | 12.182 | 228.093 | 1.376 | |
| BIPLS-GSA | 138 | 15 | 0.996 | 0.993 | 0.597 | 0.904 | 12.283 | 1801.827 | 1.454 | |
| CARS-GSA | 70 | 12 | 0.998 | 0.993 | 0.438 | 0.893 | 12.435 | 1124.644 | 1.416 |
1 Number of wavelengths; 2 modeling time spent on selecting wavelengths and training the model; 3 testing time for predicting 30 new samples using the established model.
Figure 6Prediction scatter plot for cellulose (a) and hemicellulose (b). RMSEI and RPDI represent the RMSE and RPD of the independent test set, respectively.
Figure 7Distribution of the sampling locations.