| Literature DB >> 21595989 |
Abstract
BACKGROUND: In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover, finding a minimal set of molecular markers that have optimal ability to discriminate, for example, between given groups of varieties, is important as the genotyping process can be costly in terms of laboratory consumables, labor, and time. This feature selection problem also needs special care due to the specific nature of the data used.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21595989 PMCID: PMC3128031 DOI: 10.1186/1471-2105-12-177
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Sample of SSR genotyping data
| SSR1 | SSR2 | SSR3 | SSR4 | ... | |
|---|---|---|---|---|---|
| Sample 1 | 177/181 | 191/193 | 172 | 176/182/186 | ... |
| Sample 2 | 177/181 | - | 172/174 | 176 | ... |
| Sample 3 | 175/177 | 193 | 168/172 | 180/182 | ... |
KPCLDA cross-validation results
| KPCLDA | N = 2 | N = 3 | N = 4 | N = 5 | N = 6 | N = 7 | N = 8 |
|---|---|---|---|---|---|---|---|
| tobType | |||||||
| FS | 0.02 | ||||||
| FS, | 0.02 | 0.02 | 0.01 | 0.01 | |||
| MIFS | 0.01 | ||||||
| mRMR | 0.16 | 0.06 | |||||
| landRace | |||||||
| FS | 0.40 | 0.19 | 0.11 | 0.08 | |||
| FS, | 0.43 | 0.08 | 0.05 | 0.03 | 0.04 | ||
| MIFS | 0.19 | 0.16 | 0.14 | 0.09 | 0.04 | ||
| mRMR | 0.19 | 0.11 | 0.06 | 0.06 | 0.04 | ||
| geoVar | |||||||
| FS | 0.35 | 0.25 | 0.24 | 0.22 | 0.21 | ||
| FS, | 0.37 | 0.29 | 0.31 | 0.28 | 0.19 | ||
| MIFS | 0.31 | 0.26 | 0.19 | 0.19 | 0.26 | ||
| mRMR | 0.35 | 0.20 | 0.21 | ||||
| ORvar | |||||||
| FS | 0.13 | ||||||
| FS, | 0.29 | 0.16 | 0.17 | 0.11 | 0.14 | 0.12 | 0.08 |
| MIFS | 0.19 | 0.11 | 0.14 | 0.11 | 0.18 | 0.09 | |
| mRMR | 0.26 | 0.13 | 0.09 | 0.13 | 0.09 | 0.06 | 0.07 |
KLDA cross-validation results
| KLDA | N = 2 | N = 3 | N = 4 | N = 5 | N = 6 | N = 7 | N = 8 |
|---|---|---|---|---|---|---|---|
| tobType | |||||||
| FS | 0.01 | ||||||
| FS, | 0.02 | 0.02 | 0.01 | 0.01 | |||
| MIFS | 0.09 | 0.02 | 0.03 | 0.01 | 0.01 | 0.02 | 0.02 |
| mRMR | 0.23 | 0.19 | 0.06 | 0.08 | 0.04 | 0.02 | 0.04 |
| landRace | |||||||
| FS | 0.36 | 0.16 | |||||
| FS, | 0.40 | 0.16 | 0.07 | 0.04 | 0.03 | 0.01 | 0.01 |
| MIFS | 0.11 | 0.09 | 0.04 | 0.02 | 0.01 | ||
| mRMR | 0.36 | 0.11 | 0.05 | 0.02 | 0.01 | ||
| geoVar | |||||||
| FS | 0.36 | 0.18 | 0.17 | ||||
| FS, | 0.35 | 0.33 | 0.31 | 0.20 | 0.16 | 0.18 | |
| MIFS | 0.36 | 0.26 | 0.25 | 0.21 | 0.20 | 0.22 | 0.17 |
| mRMR | 0.28 | 0.25 | 0.17 | 0.16 | |||
| ORvar | |||||||
| FS | 0.12 | 0.09 | |||||
| FS, | 0.27 | 0.14 | 0.11 | 0.13 | 0.11 | 0.07 | 0.10 |
| MIFS | 0.19 | 0.12 | 0.14 | 0.12 | 0.11 | 0.14 | 0.06 |
| mRMR | 0.24 | 0.07 | 0.08 | 0.07 | |||
Cross-validation results using the full set of markers
| Dataset | KPCLDA | KLDA |
|---|---|---|
| tobType | 0 +/- 0 | 0.065 +/- 0.029 |
| landRace | 0.117 +/- 0.012 | 0 +/- 0 |
| geoVar | 0.132 +/- 0.041 | 0.081 +/- 0.04 |
| ORvar | 0.069 +/- 0.044 | 0.098 +/- 0.043 |
Simulation results
| Quantiles | KPCLDA | KLDA |
|---|---|---|
| 0% | 0.04 | 0.01 |
| 0.5% | 0.08 | 0.03 |
| 1% | 0.10 | 0.04 |
| 5% | 0.15 | 0.08 |
| 25% | 0.23 | 0.14 |
| 50% | 0.31 | 0.20 |
| 75% | 0.38 | 0.25 |
| 100% | 0.78 | 0.62 |
Summary of the cross-validation results for all possible combinations of 5 markers among 19 markers (landRace dataset)