| Literature DB >> 35557723 |
William M Singer1, Zachary Shea1, Dajun Yu2, Haibo Huang2, M A Rouf Mian3, Chao Shang1, Maria L Rosso1, Qijan J Song4, Bo Zhang1.
Abstract
Soybean [Glycine max (L.) Merr.] seeds have an amino acid profile that provides excellent viability as a food and feed protein source. However, low concentrations of an essential amino acid, methionine, limit the nutritional utility of soybean protein. The objectives of this study were to identify genomic associations and evaluate the potential for genomic selection (GS) for methionine content in soybean seeds. We performed a genome-wide association study (GWAS) that utilized 311 soybean accessions from maturity groups IV and V grown in three locations in 2018 and 2019. A total of 35,570 single nucleotide polymorphisms (SNPs) were used to identify genomic associations with proteinogenic methionine content that was quantified by high-performance liquid chromatography (HPLC). Across four environments, 23 novel SNPs were identified as being associated with methionine content. The strongest associations were found on chromosomes 3 (ss715586112, ss715586120, ss715586126, ss715586203, and ss715586204), 8 (ss715599541 and ss715599547) and 16 (ss715625009). Several gene models were recognized within proximity to these SNPs, such as a leucine-rich repeat protein kinase and a serine/threonine protein kinase. Identification of these linked SNPs should help soybean breeders to improve protein quality in soybean seeds. GS was evaluated using k-fold cross validation within each environment with two SNP sets, the complete 35,570 set and a subset of 248 SNPs determined to be associated with methionine through GWAS. Average prediction accuracy (r 2) was highest using the SNP subset ranging from 0.45 to 0.62, which was a significant improvement from the complete set accuracy that ranged from 0.03 to 0.27. This indicated that GS utilizing a significant subset of SNPs may be a viable tool for soybean breeders seeking to improve methionine content.Entities:
Keywords: GWAS; genomic selection; methionine; soybean amino acid; soybean protein; sulfur-containing amino acid
Year: 2022 PMID: 35557723 PMCID: PMC9088226 DOI: 10.3389/fpls.2022.859109
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Countries of origin and maturity groups (MG) for clustered accessions as determined by discriminant analysis of principal components (DAPC).
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | ||||||
| ( | ( | ( | ( | ||||||
| Count | % | Count | % | Count | % | Count | % | Total | |
| Australia | – | – | – | – | 1 | 2.1 | – | 1 | |
| Brazil | – | – | – | – | 1 | 2.1 | 1 | 0.8 | 2 |
| China | 55 | 72.4 | 54 | 87.1 | – | – | 65 | 51.6 | 174 |
| Costa Rica | – | – | – | – | 1 | 2.1 | – | 1 | |
| Georgia | – | – | 1 | 1.6 | – | – | 2 | 1.6 | 3 |
| India | – | – | – | – | – | – | 1 | 0.8 | 1 |
| Indonesia | 1 | 1.3 | – | – | – | – | – | 1 | |
| Japan | 5 | 6.6 | 4 | 6.5 | 2 | 4.3 | 15 | 11.9 | 26 |
| Morocco | – | – | – | – | – | – | 1 | 0.8 | 1 |
| Nepal | – | – | – | – | – | – | 1 | 0.8 | 1 |
| North Korea | – | – | – | – | – | – | 7 | 5.6 | 7 |
| Russia | – | – | – | – | – | – | 1 | 0.8 | 1 |
| South Korea | – | – | 1 | 1.6 | 3 | 6.4 | 14 | 11 | 18 |
| Taiwan | 3 | 3.9 | – | – | – | – | 1 | 0.8 | 4 |
| Uganda | – | – | – | – | – | – | 2 | 1.6 | 2 |
| United States | – | – | 2 | 3.2 | 37 | 78.7 | 11 | 8.7 | 50 |
| Vietnam | 11 | 14.5 | – | – | – | – | 2 | 1.6 | 13 |
| Unknown | 1 | 1.3 | – | – | 2 | 4.3 | 2 | 1.6 | 5 |
| MG IV | 36 | 47.4 | 52 | 83.9 | 37 | 78.7 | 97 | 77 | 222 |
| MG V | 40 | 52.6 | 10 | 16.1 | 10 | 21.3 | 29 | 23 | 89 |
FIGURE 1Frequency distributions displaying proteinogenic Met concentrations collected from all environments (A), Blacksburg, VA (B), Warsaw, VA (C), and Clayton, NC (D).
FIGURE 2(A) Bayesian information criterion for selecting the optimal number of clusters. (B) A scatter plot depicting the four clusters (k = 4) identified as likely subpopulations within the 311 accessions: cluster I (blue triangle, n = 76), cluster II (gold diamonds, n = 62), cluster III (large red circles, n = 47), cluster IV (small purple circles, n = 126).
Significant SNPs on chromosomes 3, 4, 5, 6, 8, 12, and 16 associated with Met content (g kg–1 cp) in soybean seeds.
| Chr | Genomic location | SNP (position) | Wm82 Allele | Alter-native Allele | Environments | |||||
| 2018 BB | 2018 CL | 2018 Combined | 2019 BB | 2019 W | 2019 Combined | |||||
|
| ||||||||||
| ——————————————— -log10(P) —————————————– | ||||||||||
| 3 | Intergenic | ss715585365 (33765404) | T | G | NS | 4.29 | NS | NS | NS | NS |
| Intergenic | ss715586063 (39357229) | C | T | NS | NS | NS | 4.60 | NS | NS | |
| Intergenic | ss715586112 (39946374) | A | G | NS | NS | NS | 5.82 | NS | NS | |
| Intergenic | ss715586120 (40006278) | A | G | NS | NS | NS | 5.16 | NS | NS | |
| Coding sequence | ss715586126 (40062294) | T | G | NS | NS | NS | 5.57 | NS | NS | |
| Intergenic | ss715586201 (41217558) | A | G | NS | NS | NS | NS | NS | 4.37 | |
| Coding sequence | ss715586203 (41228895) | G | T | NS | NS | NS | NS | NS | 5.33 | |
| Intergenic | ss715586204 (41236923) | G | A | NS | NS | NS | NS | NS | 5.11 | |
| 4 | Coding sequence | ss715589347 (8089953) | T | C | NS | NS | NS | NS | 4.27 | NS |
| Intron | ss715589348 (8091107) | G | A | NS | NS | NS | NS | 4.33 | NS | |
| Coding sequence | ss715589349 (8095691) | C | T | NS | NS | NS | NS | 4.33 | NS | |
| 5 | Intergenic | ss715590327 (27762168) | A | G | NS | NS | 4.17 | NS | NS | NS |
| 6 | Coding sequence | ss715593682 (17154269) | G | A | NS | NS | NS | NS | NS | 4.39 |
| Intergenic | ss715593752 (17453327) | C | T | NS | NS | NS | NS | NS | 4.20 | |
| 8 | 3′ UTR | ss715599541 (14196322) | T | C | NS | NS | NS | 4.92 | NS | NS |
| Intergenic | ss715599547 (14226774) | G | A | NS | NS | NS | 5.81 | NS | NS | |
| 12 | Intergenic | ss715613175 (5433032) | T | G | 4.22 | NS | NS | NS | NS | NS |
| 16 | Intron | ss715625002 (37660795) | A | C | NS | NS | NS | NS | 4.78 | NS |
| Intergenic | ss715625007 (37701598) | T | G | NS | NS | NS | NS | 4.38 | NS | |
| Intergenic | ss715625009 (37712387) | T | C | NS | NS | NS | NS | 5.05 | NS | |
| Coding sequence | ss715625012 (37737235) | C | T | NS | NS | NS | NS | 4.71 | NS | |
| Intergenic | ss715625013 (37753573) | T | C | NS | NS | NS | NS | 4.74 | NS | |
| Intergenic | ss715625017 (37784014) | T | C | NS | NS | NS | NS | 4.78 | NS | |
** significance threshold (5%), * suggestive threshold (25%).
FIGURE 3SNP associations for 2018 environments, (A) combined, (B) Blacksburg, VA, (C) Clayton, NC, are displayed in Manhattan plots with chromosomes in alternating colors, significance thresholds-log10(P) > 4.91 and suggestive threshold-log10(P) > 4.16. Each respective QQ plot displays observed-log10(P) against expected-log10(P).
FIGURE 4SNP associations for 2019 environments, (A) combined, (B) Blacksburg, VA, (C) Warsaw, VA are displayed in Manhattan plots with chromosomes in alternating colors, significance threshold-log10(P) > 4.91 and suggestive threshold-log10(P) > 4.16. Each respective QQ plot displays observed-log10(P) against expected-log10(P).
Candidate gene models and descriptions within 10 kb flanking regions of significantly associated SNPs using Wm82.a2.v1.
| Chr | SNP | Candidate genes | Gene function description | Expression in soybean reproductive tissue |
| 3 | ss715586112 | Glyma.03g188100 | Modifier of rudimentary protein | High expression in flowers |
| Glyma.03g188200 | Nucleic acid binding | NA | ||
| Glyma.03g188300 | Pollen Ole e 1 allergen and extensin family protein | Little to no expression in reproductive tissue | ||
| Glyma.03g188400 | Eukaryotic aspartyl protease family protein | Moderate to high expression in seeds and pods | ||
| ss715586120 | Glyma.03g188900 | Ubiquitin-protein ligase 7 | High expression in flowers, pods, and seeds | |
| Glyma.03g189000 | Pentatricopeptide repeat (PPR) superfamily protein | Moderate to high expression in flowers, pods, and seeds | ||
| Glyma.03g189100 | Exostosin family protein | Moderate to high expression in seeds | ||
| ss715586126 | Glyma.03g189700 | Pyruvate kinase family protein | Moderate to high expression in seeds | |
| Glyma.03g189800 | Leucine-rich repeat (LRR) protein kinase family protein | High expression in pods | ||
| ss715586203 | Glyma.03g203900 | Polyketide cyclase/dehydrase/lipid transport superfamily protein | NA | |
| Glyma.03g204000 | Mal d 1-associated protein | Moderate expression in flowers, pods, and seeds | ||
| Glyma.03g204100 | Calmodulin-domain protein kinase cdpk isoform 2 | Moderate to high expression in pods | ||
| ss715586204 | Glyma.03g204200 | TPX2 (targeting protein for Xklp2) protein family | Little to no expression in reproductive tissue | |
| 8 | ss715599541 | Glyma.08g177000 | RING/U-box superfamily protein | High expression in flower and pods |
| Glyma.08g177100 | NA | Little to no expression in reproductive tissue | ||
| Glyma.08g177200 | Arabinogalactan protein 1 | NA | ||
| Glyma.08g177300 | GTP cyclohydrolase II | Little to no expression in reproductive tissue | ||
| ss715599547 | Glyma.08g177400 | Dicarboxylate transport 2.1 | Moderate expression in pods and seeds | |
| Glyma.08g177500 | Pyrimidine 2 | Moderate expression in flowers | ||
| Glyma.08g177600 | Centrin2 | High expression in flowers; moderate expression in pods | ||
| 16 | ss715625009 | Glyma.16g219800 | WRKY DNA-binding protein 70 | Little to no expression in reproductive tissue |
| Glyma.16g219900 | B-block binding subunit of TFIIIC | NA |
FIGURE 5Boxplots displaying 100 r2 values (k = 5, 20 iterations) for GS models using 35,570 SNPs (ALL) and 248 SNPs (Subset) across environments (BB = Blacksburg, VA; CL = Clayton, NC; W = Warsaw, VA; Average = Mean met across all environments).