| Literature DB >> 35008646 |
Soo-In Sohn1, Subramani Pandian1, John-Lewis Zinia Zaukuu2, Young-Ju Oh3, Soo-Yun Park1, Chae-Sun Na4, Eun-Kyoung Shin1, Hyeon-Jung Kang1, Tae-Hun Ryu1, Woo-Suk Cho1, Youn-Sung Cho1.
Abstract
In recent years, the rapid development of genetically modified (GM) technology has raised concerns about the safety of GM crops and foods for human health and the ecological environment. Gene flow from GM crops to other crops, especially in the Brassicaceae family, might pose a threat to the environment due to their weediness. Hence, finding reliable, quick, and low-cost methods to detect and monitor the presence of GM crops and crop products is important. In this study, we used visible near-infrared (Vis-NIR) spectroscopy for the effective discrimination of GM and non-GM Brassica napus, B. rapa, and F1 hybrids (B. rapa X GM B. napus). Initially, Vis-NIR spectra were collected from the plants, and the spectra were preprocessed. A combination of different preprocessing methods (four methods) and various modeling approaches (eight methods) was used for effective discrimination. Among the different combinations, the Savitzky-Golay and Support Vector Machine combination was found to be an optimal model in the discrimination of GM, non-GM, and hybrid plants with the highest accuracy rate (100%). The use of a Convolutional Neural Network with Normalization resulted in 98.9%. The same higher accuracy was found in the use of Gradient Boosted Trees and Fast Large Margin approaches. Later, phenolic acid concentration among the different plants was assessed using GC-MS analysis. Partial least squares regression analysis of Vis-NIR spectra and biochemical characteristics showed significant correlations in their respective changes. The results showed that handheld Vis-NIR spectroscopy combined with chemometric analyses could be used for the effective discrimination of GM and non-GM B. napus, B. rapa, and F1 hybrids. Biochemical composition analysis can also be combined with the Vis-NIR spectra for efficient discrimination.Entities:
Keywords: Brassica rapa; GM detection; Vis-NIR spectroscopy; chemometrics; machine learning; transgenic canola
Mesh:
Year: 2021 PMID: 35008646 PMCID: PMC8745187 DOI: 10.3390/ijms23010220
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Average spectra obtained from all the plants. (A) Raw spectra. (B) Savitzky-Golay. (C) Normalization. (D) Standard Normal Variate.
Figure 2Principal Component Analysis (PCA) paired plot (A) and PC1 vs PC2 plot (B) for the visualization of B. napus, GM B. napus, B. rapa and F1 hybrids.
Classification accuracy of the combinations of preprocessing and model for reflectance spectra from B. napus, GM B. napus, B. rapa and F1 hybrids.
| S. No | Model | Preprocessing | Average Accuracy (%) | Run Time |
|---|---|---|---|---|
| 1. | Linear Discriminant Analysis | Raw spectra | 78.3 | - |
| Normalization | 98.6 | - | ||
| Standard Normal Variate | 98.6 | - | ||
| Savitzky-Golay | 99.8 | - | ||
| 2. | Support | Raw spectra | 98.4 | 21,417 |
| Normalization | 79.6 | 41,166 | ||
| Standard Normal Variate | 98.4 | 22,074 | ||
| Savitzky-Golay | 100.0 | 30,556 | ||
| 3. | Generalized | Raw spectra | 85.4 | 32,905 |
| Normalization | 87.1 | 19,854 | ||
| Standard Normal Variate | 90.3 | 26,768 | ||
| Savitzky-Golay | 97.9 | 14,038 | ||
| 4. | Gradient Boosted Trees | Raw spectra | 95.2 | 841,966 |
| Normalization | 97.3 | 790,162 | ||
| Standard Normal Variate | 97.3 | 988,233 | ||
| Savitzky-Golay | 98.9 | 990,738 | ||
| 5. | Naive Bayes | Raw spectra | 70.5 | 6546 |
| Normalization | 74.2 | 6535 | ||
| Standard Normal Variate | 81.2 | 6210 | ||
| Savitzky-Golay | 91.4 | 6661 | ||
| 6. | Fast Large Margin | Raw spectra | 93.6 | 37,002 |
| Normalization | 71.2 | 38,845 | ||
| Standard Normal Variate | 96.2 | 37,597 | ||
| Savitzky-Golay | 98.9 | 17,611 | ||
| 7. | Raw spectra | 79.0 | 31,558 | |
| Random Forest | Normalization | 86.6 | 30,336 | |
| Standard Normal Variate | 90.9 | 31,411 | ||
| Savitzky-Golay | 91.4 | 31,590 | ||
| 8. | Convolutional Neural | Raw spectra | 91.4 | 7529 |
| Normalization | 98.9 | 7123 | ||
| Standard Normal Variate | 97.9 | 5850 | ||
| Savitzky-Golay | 96.8 | 5450 |
Figure 3Linear Discriminant Analysis for the effective discrimination of B. napus, GM B. napus, B. rapa and F1 hybrids shown without confidence circles (A) and with confidence circle (B).
Means of percentage of classification accuracy of different preprocessing and different classification model using reflectance spectra.
| Model | Species Accuracy (% ± SE) | ||||
|---|---|---|---|---|---|
| Raw Spectra | Normalization | Savitzky-Golay | SNV | Significance | |
| Naive Bayes | 74.2 ± 9.5 | 74.5 ± 3.3 b | 91.8 ± 3.1 | 82.7 ± 4.9 | ns |
| Generalized Linear Model | 86.7 ± 3.7 | 87.2 ± 2 ab | 97.3 ± 1.5 | 91.3 ± 6.3 | ns |
| Fast Large Margin | 94.1 ± 4.4 A | 73.1 ± 4.4 Bb | 99.2 ± 0.8 A | 96.3 ± 3 A | ** |
| Convolutional Neural Network | 92.8 ± 3.5 | 99.2 ± 0.8 a | 96.9 ± 3.1 | 98 ± 1.2 | ns |
| Gradient Boosted Trees | 76.1 ± 12.4 | 85.6 ± 6.4 ab | 85.2 ± 6.3 | 59.6 ± 22 | ns |
| Random Forest | 80.8 ± 6 | 87.2 ± 2.3 ab | 92.9 ± 3.5 | 91.5 ± 3.3 | ns |
| Support Vector Machine | 98.4 ± 1.6 A | 80 ± 3.6 Bb | 100 ± 0 A | 98.3 ± 1.7 A | ** |
| significance | ns | ** | ns | ns | |
ns; not significant, ** significant with the p ≤ 0.05. Different alphabetical small and capital letters show the significance of the value in the order of column and row respectively.
Analysis of variance of percentage of correctly classified B. napus, GM B. napus, B. rapa and F1 hybrids from four different preprocessing and four different classification model using reflectance spectra.
| Source | df | SS | MS | F-Value | |
|---|---|---|---|---|---|
| Preprocessing (P) | 3 | 0.186074 | 0.062025 | 4.07 | 0.0095 |
| Model (M) | 6 | 0.494012 | 0.082335 | 5.4 | <0.0001 |
| P × M | 18 | 0.426077 | 0.023671 | 1.55 | 0.0925 |
| Error | 84 | 1.280539 | 0.015245 | ||
| Total | 111 | 2.386702 |
df: degree of freedom. SS: sum of squares. MS: mean sum of squares.
Confusion matrix of species discrimination using better performing combinations of preprocessing methods and models.
|
|
|
| |||
|
|
| GM | F1 hybrid | ||
|
| 43 | 0 | 0 | 0 | 100 |
| GM | 0 | 42 | 0 | 0 | 100 |
|
| 0 | 0 | 44 | 0 | 100 |
| F1 hybrid | 0 | 0 | 0 | 56 | 100 |
| Class recall (%) | 100 | 100 | 100 | 100 | |
|
|
|
| |||
|
|
| GM | F1 hybrid | ||
|
| 42 | 0 | 0 | 0 | 100 |
| GM | 0 | 44 | 0 | 0 | 100 |
|
| 0 | 0 | 40 | 0 | 100 |
| F1 hybrid | 0 | 0 | 2 | 58 | 96.67 |
| Class recall (%) | 100 | 100 | 95.24 | 100 | |
|
|
|
| |||
|
|
| GM | F1 hybrid | ||
|
| 42 | 0 | 0 | 0 | 100 |
| GM | 0 | 44 | 0 | 0 | 100 |
|
| 0 | 0 | 40 | 0 | 100 |
| F1 hybrid | 0 | 0 | 2 | 58 | 96.67 |
| Class recall (%) | 100 | 100 | 95.24 | 100 | |
Total phenolic acid composition analysis using GC-MS.
| S. No | Phenolic Acids | GM | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Soluble | Insoluble | Total | Soluble | Insoluble | Total | Soluble | Insoluble | Total | Soluble | Insoluble | Total | ||
| 1 | 2.2 ± 0.4 | 1.1 ± 0.3 | 3.3 ± 0.6 | 2.3 ± 0.1 | 0.9 ± 0.1 | 3.1 ± 0.3 | 4.1 ± 0.7 | 1.3 ± 0.3 | 5.4 ± 1.0 | 2.2 ± 0.4 | 1.3 ± 0.8 | 3.5 ± 0.8 | |
| 2 | vanillic acid | 3.0 ± 0.6 | 1.0 ± 0.2 | 4.0 ± 0.7 | 2.7 ± 0.3 | 1.1 ± 0.2 | 3.8 ± 0.5 | 3.9 ± 0.8 | 1.0 ± 0.2 | 4.9 ± 0.9 | 2.6 ± 0.6 | 1.0 ± 0.1 | 3.6 ± 0.5 |
| 3 | syringic acid | 0.3 ± 0.2 | 0.3 ± 0.2 | 0.6 ± 0.3 | 0.3 ± 0.2 | 0.3 ± 0.2 | 0.6 ± 0.3 | 0.6 ± 0.3 | 0.4 ± 0.3 | 1.0 ± 0.4 | 0.3 ± 0.2 | 0.3 ± 0.04 | 0.6 ± 0.2 |
| 4 | 56.1 ± 14.4 | 6.9 ± 0.7 | 63.0 ± 13.8 | 28.1 ± 17.1 | 5.5 ± 0.9 | 33.7 ± 18.0 | 49.9 ± 15.7 | 12.7 ± 1.5 | 62.6 ± 15.0 | 56.2 ± 6.4 | 6.1 ± 4.2 | 62.3 ± 10.5 | |
| 5 | ferulic acid | 1498.8 ± 184.2 | 110.4 ± 17.6 | 1609.2 ± 197.5 | 1255.9 ± 120.6 | 128.1 ± 8.3 | 1384.0 ± 125.7 | 891.5 ± 51.4 | 49.4 ± 9.2 | 940.9 ± 60.5 | 1167.8 ± 132.1 | 86.3 ± 10.2 | 1254.1 ± 140.5 |
| 6 | sinapic acid | 877.3 ± 138.9 | 26.5 ± 4.8 | 903.77 ± 140.38 | 935.8 ± 427.3 | 37.2 ± 14.6 | 973.06 ± 441.83 | 1439.2 ± 518.4 | 35.2 ± 8.6 | 1474.3 ± 511.6 | 923.6 ± 73.0 | 35.2 ± 6.0 | 958.80 ± 78.64 |
Figure 4Score (A) and loading (B) plots of principal components 1 and 2 of the PCA results obtained from data on six total phenolic acids of four varieties.
PLSR prediction of phenolic compounds in all plants.
| Phenolic Compound | Latent | R2 | RMSEC (ug/g) | R2CV | RMSECV (ug/g) |
|---|---|---|---|---|---|
| 4 | 0.93 | 0.26 | 0.91 | 0.28 | |
| Vanillic acid | 4 | 0.94 | 0.13 | 0.93 | 0.14 |
| Syringic acid | 4 | 0.92 | 0.04 | 0.91 | 0.05 |
| 4 | 0.91 | 3.68 | 0.89 | 4.03 | |
| Ferulic acid | 4 | 0.94 | 58.91 | 0.93 | 64.34 |
| Sinapic acid | 4 | 0.94 | 57.89 | 0.93 | 63.64 |
Figure 5Partial least squares regression (PLSR) prediction of vanillic acid in all the plants.
Figure 6Representative photos of plants selected for the spectral analysis.