| Literature DB >> 35783963 |
Jun Qin1, Fengmin Wang1, Qingsong Zhao1, Ainong Shi2, Tiantian Zhao1, Qijian Song3, Waltram Ravelombola4, Hongzhou An1, Long Yan1, Chunyan Yang1, Mengchen Zhang1.
Abstract
Soybean is a primary meal protein for human consumption, poultry, and livestock feed. In this study, quantitative trait locus (QTL) controlling protein content was explored via genome-wide association studies (GWAS) and linkage mapping approaches based on 284 soybean accessions and 180 recombinant inbred lines (RILs), respectively, which were evaluated for protein content for 4 years. A total of 22 single nucleotide polymorphisms (SNPs) associated with protein content were detected using mixed linear model (MLM) and general linear model (GLM) methods in Tassel and 5 QTLs using Bayesian interval mapping (IM), single-trait multiple interval mapping (SMIM), single-trait composite interval mapping maximum likelihood estimation (SMLE), and single marker regression (SMR) models in Q-Gene and IciMapping. Major QTLs were detected on chromosomes 6 and 20 in both populations. The new QTL genomic region on chromosome 6 (Chr6_18844283-19315351) included 7 candidate genes and the Hap.X AA at the Chr6_19172961 position was associated with high protein content. Genomic selection (GS) of protein content was performed using Bayesian Lasso (BL) and ridge regression best linear unbiased prediction (rrBULP) based on all the SNPs and the SNPs significantly associated with protein content resulted from GWAS. The results showed that BL and rrBLUP performed similarly; GS accuracy was dependent on the SNP set and training population size. GS efficiency was higher for the SNPs derived from GWAS than random SNPs and reached a plateau when the number of markers was >2,000. The SNP markers identified in this study and other information were essential in establishing an efficient marker-assisted selection (MAS) and GS pipelines for improving soybean protein content.Entities:
Keywords: Glycine max; genome-wide association study; genomic selection; genotyping by sequencing; protein content; single nucleotide polymorphism
Year: 2022 PMID: 35783963 PMCID: PMC9244705 DOI: 10.3389/fpls.2022.882732
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
FIGURE 1(A) QTL mapping of seed protein content in soybean chromosome 6 based on single-trait multiple IM (SMIM) in Qgene, (B) The QTL, qtl-chr6_prot was mapped on the combined map between physical distance and genetic position of the chromosome 6, where the x-axis shows physical distance (Mbp) and the y-axis shows the genetic position (cM).
Single nucleotide polymorphism (SNP) markers/quantitative trait locus (QTL) detected in recombinant inbred line (RIL) and natural populations.
| SNP Markers/QTL Detected in RIL and Natural populations | Population | Model | Confidence interval | Physical position bp | LOD | Posterior (POP) | PVE (%) |
| qtl-chr6_prot | RIL | Bayesian IM | 142 | 18864382 | 0.847 | ||
| Single-trait multiple IM (SMIM) | 146–152 | 18580363–18597849 | 11.46 | 25.40 | |||
| Single-trait CIM MLE (SMLE) | 142–152 | 18580363–18864382 | 13.1 | ||||
| Single marker regression (SMR) | 141.9–144.5 | 18449510–19398117 | 13.7 | 29.60 | |||
| ICIM | 144 | 18449510–18597849 | 14.11 | 22.30 | |||
| Chr6_18658898 | POP | MLM | 18658898 | 19.95 | |||
| GLM | 18658898 | 25.76 | |||||
| qtl-chr8_prot | RIL | Bayesian IM | 56–58 | 9318625–9502316 | 0.392–0.49 | ||
| Single-trait multiple IM(SMIM) | 42–44 | 7270752–8285888 | 7.16 | 16.70 | |||
| Single-trait CIM MLE (SMLE) | 42–44 | 7270752–8285888 | 7.05 | ||||
| Single marker regression | 41.6–45.6 | 7270752–8285888 | 6.79 | 16.10 | |||
| ICIM | 61 | 9701254–9877332 | 6.35 | 9.18 | |||
| qtl-chr15_prot | RIL | Bayesian IM | 12 | 1890050 | 0.179 | ||
| 32 | 4708800–4708818 | 0.759 | |||||
| 42 | 5786875 | 0.119 | |||||
| Single-trait multiple IM (SMIM) | 20–32 | 3303648–4708818 | 3.28 | 8.00 | |||
| Single-trait CIM MLE (SMLE) | 18–20 | 3380704–3303648 | 4.43 | ||||
| 30–50 | 4708800–6651199 | 4.93 | |||||
| Single marker regression | 14.2–19.7 | 2095208–3303648 | 4.57 | 11.00 | |||
| 27.7–31.8 | 4370908–4708800 | 3.91 | 9.50 | ||||
| 45.1–54.5 | 6037184–7193889 | 4.43 | 10.70 | ||||
| ICIM | 20 | 3303648–3488588 | 4.72 | 6.60 | |||
| qtl-chr17_prot1 | RIL | Bayesian IM | 100 | 12398690–12801544 | 0.941 | ||
| Single-trait multiple IM (SMIM) | 100–124 | 12398690–13632893 | 4.11 | 9.80 | |||
| Single-trait CIM MLE (SMLE) | 104–112 | 12801549–13813134 | 4.05 | 9.90 | |||
| Single marker regression | 99.5–103.9 | 12398690–12801549 | 4.4 | ||||
| qtl-chr20_prot | RIL | Bayesian IM | 112 | 33202705 | 0.871 | ||
| Single-trait multiple IM (SMIM) | 94 | 33202705 | 6.31 | 14.90 | |||
| Single-trait CIM MLE (SMLE) | 86–114 | 26572911–33224754 | 5.34 | ||||
| Single marker regression | 93–115.3 | 26572981–33507017 | 5.12 | 12.30 | |||
| ICIM | 97 | 26957096–27003724 | 7.16 | 10.22 | |||
| Chr20_34423091 | POP | MLM | 34423091 | 7.21 | |||
| Chr20_34423091 | GLM | 34423091 | 6.55 |
FIGURE 2Structure analysis: (A) delta K-values for different numbers of populations (K) from the STRUCTURE analysis, the x-axis shows different numbers of populations (K), the y-axis shows delta K-values for different numbers of subpopulations (K). (B) Classification of 284 accessions into four subpopulations using STRUCTURE version 2.3.4, where the x-axis shows accessions and the y-axis shows the probability (from 0 to 1) of each accession belonging to subpopulation (Q = K) membership. The membership of each accession belonging to subpopulations is indicated by different colors (Q1, red; Q2, green; Q3, blue; and Q4, yellow). (C) Principal component analysis (PCA) of the population structure. Distribution of the accessions in the association panel under PC1 and PC2.
Significant SNPs associated with protein content over 4 years, chromosome (Chr.) and physical position (bp) of the significant SNPs, logarithm of odds (LOD) [-log10 (p-value)] values of generalized linear model (GLM) and mixed liner model (MLM), and allele with positive effect at the SNP locus.
| SNP Markers | Chr. | Position | Heterochromatic region | Euchromatic region | SNP Type | Allele with positive effect | LOD of GLM | LOD of MLM | SNP annotation |
| Chr03_34851073 | 3 | 34,851,073 | E | A/C | C | 12.79 | 13.69 | Glyma.03G133300 | |
| Chr03_42692363 | 3 | 42,692,363 | E | C/T | C | 10.22 | 9.18 | Glyma.03G224600 | |
| Chr05_40074496 | 5 | 40,074,496 | E | A/T | T | 20.03 | 25.86 | Glyma.05G221300 | |
| Chr05_41114434 | 5 | 41,114,434 | E | C/T | C | 13.08 | 13.12 | Upstream_gene_variant| MODIFIER| Glyma.05G234000 | |
| Chr06_14606307 | 6 | 14,606,307 | E | A/G | G | 9.30 | 8.41 | Upstream_gene_variant| MODIFIER| Glyma.06G173600 | |
| Chr06_18658898 | 6 | 18,658,898 | H | A/G | A | 19.95 | 25.76 | Glyma.06G202000 | |
| Chr08_10757609 | 8 | 10,757,609 | E | C/T | C | 7.23 | 6.65 | Glyma.08G140700 | |
| Chr09_5898756 | 9 | 5,898,756 | E | A/G | G | 8.80 | 8.45 | Glyma.09G062100 | |
| Chr09_45699847 | 9 | 45,699,847 | E | A/G | G | 8.55 | 7.64 | Glyma.09G234500 | |
| Chr10_2992389 | 10 | 2,992,389 | E | A/T | A | 8.47 | 7.94 | Glyma.10G034400 | |
| Chr10_44549078 | 10 | 44,549,078 | E | A/G | G | 9.22 | 8.80 | Glyma.10G213000 | |
| Chr12_1536444 | 12 | 1,536,444 | E | A/G | G | 7.48 | 6.29 | Glyma.12G021400 | |
| Chr14_2351357 | 14 | 2,351,357 | E | C/T | C | 10.58 | 9.26 | Glyma.14G032300 | |
| Chr14_48312781 | 14 | 48,312,781 | E | C/G | G | 20.07 | 26.26 | Upstream_gene_variant| MODIFIER| Glyma.14G218000 | |
| Chr15_13541492 | 15 | 13,541,492 | H | C/G | C | 8.48 | 7.71 | Upstream_gene_variant| MODIFIER| Glyma.15G160000 | |
| Chr17_347445 | 17 | 347,445 | E | A/T | A | 9.38 | 9.40 | Glyma.17G003000 | |
| Chr17_32480031 | 17 | 32,480,031 | H | A/G | G | 6.64 | 6.73 | Intergenic_region| MODIFIER| Glyma.17G203300-Glyma.17G203400 | |
| Chr18_7837981 | 18 | 7,837,981 | E | A/C | C | 14.27 | 13.62 | Glyma.18G081200 | |
| Chr18_18834295 | 18 | 18,834,295 | E | A/C | C | 8.24 | 7.25 | Intergenic_region| MODIFIER| Glyma.18G133000-Glyma.18G133100 | |
| Chr18_50849168 | 18 | 50,849,168 | E | A/T | A | 11.29 | 11.66 | Upstream_gene_variant| MODIFIER| Glyma.18G221300 | |
| Chr19_12210884 | 19 | 12,210,884 | H | C/T | T | 19.91 | 25.75 | Intergenic_region| MODIFIER| Glyma.19G060900-Glyma.19G061000 | |
| Chr20_34423091 | 20 | 34,423,091 | E | A/T | T | 7.21 | 6.55 | Glyma.20G100900 |
FIGURE 3(A) The extent of linkage disequilibrium (LD) in the regions based on pairwise r2 values. The r2 values are indicated using the color intensity index. Heatmap showing LD between each pair of markers that passed the Bonferroni threshold in genome-wide association study (GWAS). (B) Candidate genes for each single nucleotide polymorphism (SNP) locus. The bottom panel depicts the extent of linkage disequilibrium in the regions based on pairwise r2 values. The r2 values are indicated using the color intensity index shown. (C) Boxplot of seed protein based on different genotypes in soybean accessions. (D) Boxplot of seed protein based on Hap.X and Hap.X phenotypic differences between genotype combinations of the two SNPs.
FIGURE 4Boxplots show the effect of different SNP density sets on genomic selection in the Bayesian Lasso Regression (BLR) model and ridge regression best linear unbiased prediction (rrBLUP) models.
FIGURE 5Boxplots show the effect of training population size on genomic selection accuracy by conducting cross-validation at different folds with 100 replications for each cross-validation fold using rrBLUP.