| Literature DB >> 22666635 |
Hai-Feng Cui1, Zi-Hong Ye, Lu Xu, Xian-Shu Fu, Cui-Wen Fan, Xiao-Ping Yu.
Abstract
This paper reports the application of near infrared (NIR) spectroscopy and pattern recognition methods to rapid and automatic discrimination of the genotypes (parent, transgenic, and parent-transgenic hybrid) of cotton plants. Diffuse reflectance NIR spectra of representative cotton seeds (n = 120) and leaves (n = 123) were measured in the range of 4000-12000 cm(-1). A practical problem when developing classification models is the degradation and even breakdown of models caused by outliers. Considering the high-dimensional nature and uncertainty of potential spectral outliers, robust principal component analysis (rPCA) was applied to each separate sample group to detect and exclude outliers. The influence of different data preprocessing methods on model prediction performance was also investigated. The results demonstrate that rPCA can effectively detect outliers and maintain the efficiency of discriminant analysis. Moreover, the classification accuracy can be significantly improved by second-order derivative and standard normal variate (SNV). The best partial least squares discriminant analysis (PLSDA) models obtained total classification accuracy of 100% and 97.6% for seeds and leaves, respectively.Entities:
Year: 2012 PMID: 22666635 PMCID: PMC3361229 DOI: 10.1155/2012/793468
Source DB: PubMed Journal: J Anal Methods Chem ISSN: 2090-8873 Impact factor: 2.193
Analyzed cotton plants.
| Objects | Acquisition time | Plantation | Genotype | Sample size |
|---|---|---|---|---|
| Seeds | 2010.9 | Zhejiang University | Parent 222 | 41 |
|
| ||||
| Leaves | 2011.10 | China Jiliang University | Parent 222 | 41 |
Figure 1Some of the raw NIR spectra of cotton leaves (a) and seeds (b). The genotypes were (1) parent 222, (2) transgenic 07-19, and (3) hybrid 08-6.
Figure 2Some of the NIR spectra of cotton leaves preprocessed by (a) smoothing, (b) second-order derivative, and (c) SNV. The genotypes were (1) parent 222, (2) transgenic 07-19, and (3) hybrid 08-6.
Figure 3Some of the NIR spectra of cotton seeds preprocessed by (a) smoothing, (b) second-order derivative, and (c) SNV. The genotypes were (1) parent 222, (2) transgenic 07-19, and (3) hybrid 08-6.
Figure 4Robust PCA outlier diagnosis of the transgenic cotton leaves based on raw spectra.
Results of outlier diagnosis.
| Objects | Genotype | Orthogonal outliers | Bad PCA leverages | Final data sizes |
|---|---|---|---|---|
| Seeds | Parent 222 | 16,19 | 1 | 38 |
|
| ||||
| Leaves | Parent 222 | 34 | 9 | 39 |
Splitting of data with outliers waded into training and test sets.
| Objects | Genotype | Clean data size | Splitting (training/test) | Total (training/test) |
|---|---|---|---|---|
| Seeds | Parent 222 | 38 | 25/13 | 75/36 |
|
| ||||
| Leaves | Parent 222 | 39 | 25/14 | 75/42 |
Classification results of test set with different preprocessing methods.
| Objects | Preprocessing | Wrongly classified | Total accuracy |
|---|---|---|---|
| Seeds | Raw | 5 | 86.1% |
| Smoothing | 3 | 91.7% | |
| 2nd derivative | 0 | 100.0% | |
| SNV | 1 | 97.2% | |
|
| |||
| Leaves | Raw | 4 | 90.5% |
| Smoothing | 7 | 83.3% | |
| 2nd derivative | 1 | 97.6% | |
| SNV | 1 | 97.6% | |