| Literature DB >> 35592556 |
Caio Canella Vieira1, Jing Zhou2, Mariola Usovsky3, Tri Vuong3, Amanda D Howland4, Dongho Lee1, Zenglu Li5, Jianfeng Zhou3, Grover Shannon1, Henry T Nguyen3, Pengyin Chen1.
Abstract
Southern root-knot nematode [SRKN, Meloidogyne incognita (Kofold & White) Chitwood] is a plant-parasitic nematode challenging to control due to its short life cycle, a wide range of hosts, and limited management options, of which genetic resistance is the main option to efficiently control the damage caused by SRKN. To date, a major quantitative trait locus (QTL) mapped on chromosome (Chr.) 10 plays an essential role in resistance to SRKN in soybean varieties. The confidence of discovered trait-loci associations by traditional methods is often limited by the assumptions of individual single nucleotide polymorphisms (SNPs) always acting independently as well as the phenotype following a Gaussian distribution. Therefore, the objective of this study was to conduct machine learning (ML)-based genome-wide association studies (GWAS) utilizing Random Forest (RF) and Support Vector Machine (SVM) algorithms to unveil novel regions of the soybean genome associated with resistance to SRKN. A total of 717 breeding lines derived from 330 unique bi-parental populations were genotyped with the Illumina Infinium BARCSoySNP6K BeadChip and phenotyped for SRKN resistance in a greenhouse. A GWAS pipeline involving a supervised feature dimension reduction based on Variable Importance in Projection (VIP) and SNP detection based on classification accuracy was proposed. Minor effect SNPs were detected by the proposed ML-GWAS methodology but not identified using Bayesian-information and linkage-disequilibrium Iteratively Nested Keyway (BLINK), Fixed and Random Model Circulating Probability Unification (FarmCPU), and Enriched Compressed Mixed Linear Model (ECMLM) models. Besides the genomic region on Chr. 10 that can explain most of SRKN resistance variance, additional minor effects SNPs were also identified on Chrs. 10 and 11. The findings in this study demonstrated that overfitting in GWAS may lead to lower prediction accuracy, and the detection of significant SNPs based on classification accuracy limited false-positive associations. The expansion of the basis of the genetic resistance to SRKN can potentially reduce the selection pressure over the major QTL on Chr. 10 and achieve higher levels of resistance.Entities:
Keywords: GWAS; feature selection; machine learning; root-knot nematode; soybean
Year: 2022 PMID: 35592556 PMCID: PMC9111516 DOI: 10.3389/fpls.2022.883280
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
FIGURE 1Variable Importance in Projection (VIP)-based Manhattan plot of the 4,974 SNPs. The SNPs with VIP scores higher than 2.0 are highlighted in blue, and the 29 non-correlated SNPs with VIP scores higher than 2.0 selected to be used in the ML-based GWAS are colored in red.
Summary of SVM classification accuracy metrics based on the number of predictors.
| # SNPs | Overall accuracy | MCC | Resistant | Moderate | Susceptible | ||||||
| Accuracy | Precision | Specificity | Accuracy | Precision | Specificity | Accuracy | Precision | Specificity | |||
| 1 | 0.74 | 0.53 | 0.87 | 0.55 | 0.80 | 0.50 | 0.00 | 1.00 | 0.80 | 0.84 | 0.73 |
| 2 | 0.74 | 0.51 | 0.85 | 0.59 | 0.84 | 0.52 | 0.29 | 0.96 | 0.80 | 0.84 | 0.73 |
| 3 | 0.75 | 0.54 | 0.85 | 0.59 | 0.84 | 0.53 | 0.50 | 0.98 | 0.80 | 0.83 | 0.71 |
| 4 | 0.76 | 0.56 | 0.86 | 0.62 | 0.86 | 0.53 | 0.50 | 0.98 | 0.82 | 0.84 | 0.71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 | 0.78 | 0.59 | 0.86 | 0.62 | 0.86 | 0.56 | 0.75 | 0.99 | 0.83 | 0.85 | 0.73 |
| 7 | 0.77 | 0.57 | 0.86 | 0.62 | 0.86 | 0.54 | 0.67 | 0.99 | 0.82 | 0.84 | 0.71 |
| 8 | 0.77 | 0.57 | 0.86 | 0.62 | 0.86 | 0.54 | 0.67 | 0.99 | 0.82 | 0.84 | 0.71 |
| 9 | 0.76 | 0.56 | 0.86 | 0.62 | 0.86 | 0.53 | 0.50 | 0.98 | 0.82 | 0.84 | 0.71 |
| 10 | 0.76 | 0.56 | 0.86 | 0.60 | 0.85 | 0.53 | 0.50 | 0.98 | 0.82 | 0.85 | 0.73 |
| 11 | 0.76 | 0.55 | 0.86 | 0.60 | 0.85 | 0.52 | 0.33 | 0.97 | 0.83 | 0.85 | 0.75 |
| 12 | 0.74 | 0.52 | 0.86 | 0.62 | 0.86 | 0.51 | 0.25 | 0.95 | 0.81 | 0.84 | 0.73 |
| 13 | 0.76 | 0.57 | 0.88 | 0.63 | 0.86 | 0.54 | 0.38 | 0.96 | 0.83 | 0.86 | 0.76 |
| 14 | 0.75 | 0.54 | 0.86 | 0.62 | 0.86 | 0.53 | 0.33 | 0.95 | 0.82 | 0.85 | 0.75 |
| 15 | 0.75 | 0.54 | 0.86 | 0.62 | 0.86 | 0.53 | 0.33 | 0.95 | 0.82 | 0.85 | 0.75 |
| 16 | 0.75 | 0.54 | 0.86 | 0.62 | 0.86 | 0.53 | 0.33 | 0.95 | 0.82 | 0.85 | 0.75 |
| 17 | 0.75 | 0.54 | 0.86 | 0.60 | 0.85 | 0.52 | 0.29 | 0.96 | 0.82 | 0.85 | 0.75 |
| 18 | 0.76 | 0.55 | 0.86 | 0.60 | 0.85 | 0.54 | 0.38 | 0.96 | 0.83 | 0.86 | 0.76 |
| 19 | 0.76 | 0.55 | 0.86 | 0.60 | 0.85 | 0.54 | 0.38 | 0.96 | 0.83 | 0.86 | 0.76 |
| 20 | 0.76 | 0.55 | 0.86 | 0.60 | 0.85 | 0.54 | 0.38 | 0.96 | 0.83 | 0.86 | 0.76 |
| 21 | 0.75 | 0.54 | 0.85 | 0.61 | 0.86 | 0.53 | 0.33 | 0.95 | 0.82 | 0.85 | 0.75 |
| 22 | 0.74 | 0.53 | 0.85 | 0.57 | 0.82 | 0.51 | 0.25 | 0.97 | 0.82 | 0.85 | 0.75 |
| 23 | 0.74 | 0.53 | 0.86 | 0.60 | 0.85 | 0.51 | 0.25 | 0.95 | 0.82 | 0.85 | 0.75 |
| 24 | 0.74 | 0.53 | 0.83 | 0.56 | 0.82 | 0.51 | 0.33 | 0.98 | 0.82 | 0.84 | 0.73 |
| 25 | 0.74 | 0.51 | 0.80 | 0.64 | 0.89 | 0.58 | 0.38 | 0.92 | 0.81 | 0.84 | 0.73 |
| 26 | 0.75 | 0.53 | 0.82 | 0.67 | 0.90 | 0.60 | 0.41 | 0.92 | 0.81 | 0.84 | 0.73 |
| 27 | 0.74 | 0.53 | 0.83 | 0.56 | 0.82 | 0.54 | 0.67 | 0.99 | 0.80 | 0.83 | 0.71 |
| 28 | 0.74 | 0.52 | 0.83 | 0.57 | 0.83 | 0.53 | 0.50 | 0.98 | 0.80 | 0.83 | 0.71 |
| 29 | 0.73 | 0.50 | 0.83 | 0.57 | 0.83 | 0.53 | 0.33 | 0.95 | 0.81 | 0.85 | 0.75 |
The bold rows are the combination of SNPs with the highest accuracy.
Summary of RF classification accuracy metrics based on the number of predictors.
| # SNPs | Overall accuracy | MCC | Tolerant | Moderate | Susceptible | ||||||
| Accuracy | Precision | Specificity | Accuracy | Precision | Specificity | Accuracy | Precision | Specificity | |||
| 2 | 0.73 | 0.50 | 0.85 | 0.59 | 0.84 | 0.50 | 0.17 | 0.96 | 0.79 | 0.83 | 0.71 |
| 3 | 0.74 | 0.54 | 0.85 | 0.59 | 0.84 | 0.51 | 0.33 | 0.98 | 0.80 | 0.83 | 0.69 |
| 4 | 0.77 | 0.58 | 0.85 | 0.59 | 0.84 | 0.59 | 0.63 | 0.98 | 0.84 | 0.87 | 0.78 |
| 5 | 0.78 | 0.59 | 0.86 | 0.62 | 0.86 | 0.57 | 0.57 | 0.98 | 0.84 | 0.86 | 0.76 |
| 6 | 0.77 | 0.57 | 0.86 | 0.62 | 0.86 | 0.54 | 0.43 | 0.97 | 0.84 | 0.86 | 0.76 |
| 7 | 0.77 | 0.58 | 0.86 | 0.62 | 0.86 | 0.54 | 0.43 | 0.97 | 0.84 | 0.86 | 0.76 |
| 8 | 0.76 | 0.59 | 0.83 | 0.62 | 0.87 | 0.56 | 0.40 | 0.95 | 0.84 | 0.86 | 0.76 |
| 9 | 0.77 | 0.58 | 0.86 | 0.61 | 0.85 | 0.58 | 0.56 | 0.97 | 0.84 | 0.87 | 0.78 |
| 10 | 0.76 | 0.57 | 0.84 | 0.60 | 0.85 | 0.60 | 0.50 | 0.95 | 0.84 | 0.88 | 0.80 |
| 11 | 0.79 | 0.60 | 0.86 | 0.62 | 0.86 | 0.60 | 0.60 | 0.97 | 0.86 | 0.88 | 0.80 |
| 12 | 0.79 | 0.62 | 0.87 | 0.63 | 0.87 | 0.62 | 0.64 | 0.97 | 0.86 | 0.88 | 0.80 |
| 13 | 0.79 | 0.61 | 0.87 | 0.63 | 0.87 | 0.62 | 0.64 | 0.97 | 0.86 | 0.88 | 0.80 |
| 14 | 0.79 | 0.63 | 0.86 | 0.62 | 0.86 | 0.62 | 0.64 | 0.97 | 0.85 | 0.88 | 0.80 |
| 15 | 0.78 | 0.60 | 0.86 | 0.62 | 0.86 | 0.59 | 0.63 | 0.98 | 0.84 | 0.86 | 0.76 |
| 16 | 0.78 | 0.56 | 0.87 | 0.63 | 0.87 | 0.58 | 0.56 | 0.97 | 0.84 | 0.86 | 0.76 |
| 17 | 0.79 | 0.55 | 0.85 | 0.67 | 0.90 | 0.61 | 0.67 | 0.98 | 0.82 | 0.84 | 0.71 |
| 18 | 0.79 | 0.53 | 0.84 | 0.68 | 0.90 | 0.61 | 0.67 | 0.98 | 0.82 | 0.83 | 0.69 |
| 19 | 0.79 | 0.59 | 0.82 | 0.67 | 0.90 | 0.60 | 0.55 | 0.96 | 0.84 | 0.85 | 0.73 |
| 20 | 0.79 | 0.56 | 0.84 | 0.68 | 0.90 | 0.60 | 0.55 | 0.96 | 0.85 | 0.86 | 0.75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 | 0.79 | 0.57 | 0.88 | 0.67 | 0.89 | 0.60 | 0.60 | 0.97 | 0.84 | 0.86 | 0.76 |
| 23 | 0.79 | 0.57 | 0.86 | 0.66 | 0.89 | 0.58 | 0.56 | 0.97 | 0.84 | 0.86 | 0.75 |
| 24 | 0.78 | 0.60 | 0.86 | 0.66 | 0.89 | 0.59 | 0.63 | 0.98 | 0.82 | 0.84 | 0.71 |
| 25 | 0.78 | 0.59 | 0.87 | 0.65 | 0.88 | 0.56 | 0.44 | 0.96 | 0.84 | 0.86 | 0.76 |
| 26 | 0.77 | 0.57 | 0.85 | 0.63 | 0.87 | 0.60 | 0.55 | 0.96 | 0.83 | 0.86 | 0.76 |
| 27 | 0.79 | 0.59 | 0.86 | 0.68 | 0.90 | 0.60 | 0.60 | 0.97 | 0.82 | 0.85 | 0.73 |
| 28 | 0.78 | 0.60 | 0.84 | 0.65 | 0.89 | 0.61 | 0.67 | 0.98 | 0.82 | 0.84 | 0.71 |
| 29 | 0.78 | 0.59 | 0.84 | 0.65 | 0.89 | 0.59 | 0.63 | 0.98 | 0.82 | 0.84 | 0.71 |
The bold rows are the combination of SNPs with the highest accuracy.
FIGURE 2Prediction accuracy of RF models by the number of SNPs included as predictors.
Summary of significant SNP-trait associations identified by GWAS using the BLINK, FarmCPU, and ECMLM models.
| SNP | Chr | Position | MAF | BLINK | FarmCPU | ECMLM | Significant models |
|
| 10 | 1,232,205 | 0.41 |
|
|
| BLINK, FarmCPU, ECMLM |
|
| 10 | 1,586,434 | 0.20 |
|
|
| BLINK, FarmCPU, ECMLM |
|
| 10 | 1,623,075 | 0.41 |
|
|
| BLINK, FarmCPU, ECMLM |
|
| 10 | 1,426,801 | 0.37 |
| 1.00000 |
| BLINK, ECMLM |
|
| 10 | 1,475,647 | 0.14 |
| 1.00000 |
| BLINK, ECMLM |
|
| 14 | 3,470,438 | 0.39 |
| 1.00000 | 1.00000 | BLINK |
|
| 10 | 39,827,303 | 0.49 | 0.22398 |
| 1.00000 | FarmCPU |
|
| 10 | 1,268,065 | 0.35 | 0.14341 | 1.00000 |
| ECMLM |
|
| 10 | 981,062 | 0.44 | 1.00000 | 1.00000 |
| ECMLM |
|
| 10 | 1,341,309 | 0.25 | 1.00000 | 1.00000 |
| ECMLM |
|
| 10 | 925,972 | 0.47 | 1.00000 | 1.00000 |
| ECMLM |
|
| 10 | 2,714,130 | 0.46 | 0.08015 | 1.00000 |
| ECMLM |
|
| 10 | 831,916 | 0.47 | 1.00000 | 1.00000 |
| ECMLM |
|
| 10 | 754,804 | 0.47 | 1.00000 | 1.00000 |
| ECMLM |
|
| 10 | 1,051,336 | 0.37 | 1.00000 | 1.00000 |
| ECMLM |
|
| 10 | 14,714 | 0.11 | 0.90685 | 1.00000 |
| ECMLM |
|
| 10 | 406,427 | 0.14 | 1.00000 | 1.00000 |
| ECMLM |
|
| 10 | 2,240,113 | 0.38 | 0.90685 | 1.00000 |
| ECMLM |
|
| 10 | 2,482,570 | 0.38 | 0.90685 | 1.00000 |
| ECMLM |
|
| 10 | 2,437,001 | 0.18 | 1.00000 | 1.00000 |
| ECMLM |
Associations with a p-value lower than 0.05 are in bold.