| Literature DB >> 33126520 |
Hongyan Zhu1, Jun-Li Xu2.
Abstract
Different varieties and geographical origins of walnut usually lead to different nutritional values, contributing to a big difference in the final price. The conventional analytical techniques have some unavoidable limitations, e.g., chemical analysis is usually time-expensive and labor-intensive. Therefore, this work aims to apply Fourier transform mid-infrared spectroscopy coupled with machine learning algorithms for the rapid and accurate classification of walnut species that originated from ten varieties produced from four provinces. Three types of models were developed by using five machine learning classifiers to (1) differentiate four geographical origins; (2) identify varieties produced from the same origin; and (3) classify all 10 varieties from four origins. Prior to modeling, the wavelet transform algorithm was used to smooth and denoise the spectrum. The results showed that the identification of varieties under the same origin performed the best (i.e., accuracy = 100% for some origins), followed by the classification of four different origins (i.e., accuracy = 96.97%), while the discrimination of all 10 varieties is the least desirable (i.e., accuracy = 87.88%). Our results implicated that using the full spectral range of 700-4350 cm-1 is inferior to using the subsets of the optimal spectral variables for some classifiers. Additionally, it is demonstrated that back propagation neural network (BPNN) delivered the best model performance, while random forests (RF) produced the worst outcome. Hence, this work showed that the authentication and provenance of walnut can be realized effectively based on Fourier transform mid-infrared spectroscopy combined with machine learning algorithms.Entities:
Keywords: Fourier transform mid-infrared spectroscopy; genetic algorithm-partial least squares; machine learning; successive projection algorithm; walnut
Mesh:
Year: 2020 PMID: 33126520 PMCID: PMC7662659 DOI: 10.3390/molecules25214987
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1The preprocessed mean spectra calculated from each geographic origin.
Figure 2The histogram of frequency to be selected for individual variables.
Figure 3The root mean square error of cross-validation (RMSECV) of the number of variables included. The global, the better, and the suggested model are marked with green, blue, and red stars, respectively.
Figure 4The stability of individual variables obtained by applying uninformative variable elimination (UVE).
Figure 5The selected spectral variables after performing UVE-SPA (uninformative variable elimination combining with successive projection algorithm).
Figure 6The score plot of the first three principal components (PCs) with different geographic origins highlighted in different markers.
Modeling performances to classify four geographic origins on test set built from the full spectral range and subsets of the selected variable using different classifiers.
| Classifier | Parameter | Yunnan | Xinjiang | Shaanxi | Hebei | Overall |
|---|---|---|---|---|---|---|
| ELM | 62 | 84.62 | 70.00 | 100.00 | 52.63 | 74.24 |
| RF | 40 | 61.54 | 65.00 | 57.14 | 89.47 | 69.70 |
| RBF | 66 | 69.23 | 100.00 | 64.29 | 94.74 | 84.85 |
| PLS-DA | 12 | 69.23 | 50.00 | 71.43 | 84.21 | 68.18 |
| UVE-SPA-ELM | 56 | 61.54 | 90.00 | 71.43 | 94.74 | 81.82 |
| UVE-SPA-RF | 88 | 58.85 | 75.00 | 50.00 | 89.47 | 69.70 |
| UVE-SPA-RBF | 70 | 76.92 | 35.00 | 28.57 | 78.95 | 54.55 |
| UVE-SPA-PLS-DA | 6 | 53.85 | 90.00 | 100.00 | 89.47 | 84.85 |
| UVE-SPA-BPNN | 8 | 100.00 | 100.00 | 93.33 | 94.74 | 96.97 |
| GA-PLS-ELM | 108 | 69.23 | 85.00 | 71.43 | 78.95 | 77.27 |
| GA-PLS-RF | 60 | 58.85 | 75.00 | 50.00 | 89.47 | 69.70 |
| GA-PLS-RBF | 15 | 61.54 | 90.00 | 64.29 | 94.74 | 80.30 |
| GA-PLS-PLS-DA | 9 | 84.62 | 85.00 | 92.86 | 94.74 | 89.39 |
| GA-PLS-BPNN | 6 | 92.31 | 95.00 | 92.86 | 100.00 | 95.45 |
Note: Parameter: number of latent variables (LVs) for partial least squares–discrimination analysis (PLS-DA), number of forest trees for random forest (RF), number of nodes in the hidden layer for radial basis function (RBF), number of nodes for extreme learning machine (ELM), and number of neurons in the hidden layer for back propagation neural network (BPNN).
Modeling performances to classify varieties within the same origin on a test set built from the full spectral range and subsets of the selected variable using different classifiers.
| Origin | Variable Input | ELM | RF | RBF | PLS-DA | BPNN |
|---|---|---|---|---|---|---|
| Yunnan | Full | 84.62 | 84.62 | 92.31 | 84.62 | - |
| GA-PLS | 92.31 | 84.62 | 92.31 | 92.31 | 100.00 | |
| UVE-SPA | 92.31 | 84.62 | 100.00 | 92.31 | 100.00 | |
| Xinjiang | Full | 70.00 | 65.00 | 90.00 | 70.00 | - |
| GA-PLS | 90.00 | 65.00 | 90.00 | 65.00 | 94.74 | |
| UVE-SPA | 100.00 | 70.00 | 85.00 | 65.00 | 100.00 | |
| Shaanxi | Full | 85.71 | 92.31 | 100.00 | 100.00 | - |
| GA-PLS | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| UVE-SPA | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| Hebei | Full | 73.68 | 68.42 | 68.42 | 78.95 | - |
| GA-PLS | 78.95 | 73.68 | 73.68 | 78.95 | 94.74 | |
| UVE-SPA | 84.21 | 68.42 | 63.16 | 73.68 | 89.47 |
Modeling performances to classify all 10 varieties on a test set built from the full spectral range and subsets of the selected variable using different classifiers.
| Classifier | ELM | RF | RBF | PLS-DA | BPNN | |
|---|---|---|---|---|---|---|
| Variable Input | ||||||
| Full | 60.61 | 54.55 | 68.18 | 42.42 | - | |
| GA-PLS | 68.18 | 53.03 | 71.21 | 60.61 | 87.88 | |
| UVE-SPA | 66.67 | 48.48 | 60.61 | 51.52 | 83.33 | |
Details of the collected walnut samples and the characterization of each variety.
| Province | Geographical Location | Variety | Characteristic | Sample Size | Data Partition (Training/Test Samples) |
|---|---|---|---|---|---|
| Yunnan | Southwest of China; 97°32′ ≈ 106°12′ E, 21°08′ ≈ 29°15′ N | No. 1: Yangbi Dapao | As the most planted variety in Yunnan, it is mainly distributed on the western slope of Cangshan Mountain in Yunnan, accounting for about 80% of Yangbi walnuts. | 20 | 13/7 |
| No. 2: Yangbi Caoguo | It is mostly found in Meiji Village, West Town of Cangshan, Yunnan. The inner folds are well developed, and whole kernels can be collected. | 19 | 13/6 | ||
| Xinjiang | Northwest of China; 34°22′ ≈ 49°33′ E, 73°41′ ≈ 96°18′ N | No. 3: Hetian 185 | It is the main walnut variety cultivated in Xinjiang, mostly found in southern Xinjiang. | 19 | 13/6 |
| No. 4: Xinfeng | Grown at the altitude of 1700–2400 m, it is named after the skin, which is as thin as paper, and the whole kernel is easy to collect. | 20 | 13/7 | ||
| No. 5: Xinxin 2 | It is an early-maturing variety with the characteristics of high yield and good stability. | 20 | 13/7 | ||
| Shaanxi | Northwest of China; 105°29′ ≈ 111°15′ E, 31°42′ ≈ 39°35′ N | No. 6: Liao 4 | As a crossbreed, this variety has strong adaptability, cold and drought tolerance, making it suitable for northern cultivation areas. | 20 | 13/7 |
| No. 7: Xiangling | It is a mid-ripening variety, ideal for cultivation in thick and fertile soil conditions. | 20 | 13/7 | ||
| Hebei | Northern China; 113°04′ ≈ 119°53′ E, 36°01′ ≈ 42°37′ N | No.8: Qingxiang | It belongs to the late-maturing type, which was introduced from Japan. | 16 | 10/6 |
| No.9: Liao 1 | It is the main variety of walnut cultivated in Hebei. | 18 | 12/6 | ||
| No.10: Liao 8 | As one of the early-fruiting walnut varieties cultivated by hybridization, it gets mature in mid-September. | 20 | 13/7 |