| Literature DB >> 30806741 |
Sungwoo Lee1,2, Kyujung Van3, Mikyung Sung1, Randall Nelson4, Jonathan LaMantia5, Leah K McHale6,7, M A Rouf Mian1,8.
Abstract
KEY MESSAGE: Genomic regions associated with seed protein, oil and amino acid contents were identified by genome-wide association analyses. Geographic distributions of haplotypes indicate scope of improvement of these traits. Soybean [Glycine max (L.) Merr.] protein and oil are used worldwide in feed, food and industrial materials. Increasing seed protein and oil contents is important; however, protein content is generally negatively correlated with oil content. We conducted a genome-wide association study using phenotypic data collected from five environments for 621 accessions in maturity groups I-IV and 34,014 markers to identify quantitative trait loci (QTL) for seed content of protein, oil and several essential amino acids. Three and five genomic regions were associated with seed protein and oil contents, respectively. One, three, one and four genomic regions were associated with cysteine, methionine, lysine and threonine content (g kg-1 crude protein), respectively. As previously shown, QTL on chromosomes 15 and 20 were associated with seed protein and oil contents, with both exhibiting opposite effects on the two traits, and the chromosome 20 QTL having the most significant effect. A multi-trait mixed model identified trait-specific QTL. A QTL on chromosome 5 increased oil with no effect on protein content, and a QTL on chromosome 10 increased protein content with little effect on oil content. The chromosome 10 QTL co-localized with maturity gene E2/GmGIa. Identification of trait-specific QTL indicates feasibility to reduce the negative correlation between protein and oil contents. Haplotype blocks were defined at the QTL identified on chromosomes 5, 10, 15 and 20. Frequencies of positive effect haplotypes varied across maturity groups and geographic regions, providing guidance on which alleles have potential to contribute to soybean improvement for specific regions.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30806741 PMCID: PMC6531425 DOI: 10.1007/s00122-019-03304-5
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Fig. 1Phenotypic distribution of seed protein and oil contents (a) by scaled best linear unbiased predictor (BLUP) values across all environments (ALL) among the 621 plant introductions and their correlation (P < 0.0001) (b). Phenotypic distribution of amino acids by scaled BLUP values across all environments was also shown (c)
Variance component estimates and broad-sense heritability of traits assessed in 621 soybean accessions grown at Wooster, OH, in 2014 and 2015 (OHW14, OHW15), Columbus, OH, in 2015 (OHC15), Urbana, IL, in 2015 (IL15) and Plymouth, NC, in 2015 (NC15)
| Parameter | Protein (g kg−1) | Oil (g kg−1) | Methionine (g kg−1 cp) | Cysteine (g kg−1 cp) | Lysine (g kg−1 cp) | Threonine (g kg−1 cp) |
|---|---|---|---|---|---|---|
| Environment | 0.87 | 1.05 | 0.05 | 0.04 | 0.12 | 0.01 |
| Replication (environment) | 0.00 | 0.00 | 0.02 | 0.04 | 0.04 | 0.02 |
| Block (replication × environment) | 0.87 | 0.24 | 0.01 | 0.02 | 0.05 | 0.06 |
| Genotype | 3.21 | 1.71 | 0.08 | 0.11 | 0.35 | 0.28 |
| Genotype × environment | 0.56 | 0.17 | 0.02 | 0.05 | 0.12 | 0.06 |
| Error | 0.97 | 0.23 | 0.06 | 0.16 | 0.26 | 0.17 |
| Broad-sense heritability | 0.94 | 0.97 | 0.88 | 0.80 | 0.87 | 0.90 |
Fig. 2Population structure of the 621 soybean accessions. a Plot of STRUCTURE analysis (K = 2). Accessions were sorted by geographic location from which each accession was collected, and colored bars correspond to the STRUCTURE assignments (Q1 and Q2) (b). Principle component analysis (PCA) of the 621 soybean accessions with the country of origin indicated by color of marker
Fig. 3Manhattan plots (left) and QQ-plots (right) for GWAS of the 621 soybean accessions for protein (a) and oil (b) contents using multi-locus mixed model and opposite effect (c) by multi-trait mixed model. The trait associations for 34,014 SNPs were plotted by all environments combined (ALL) (a and b) or the Wooster, Ohio 2015 environment (OHW15) (c). Red and blue horizontal lines in the Manhattan plots and markers in the QQ-plots represent the genome-wide significant threshold (5%) and suggestive significance thresholds (25%), respectively, and the SNPs significantly associated at those levels. Shaded regions of the QQ-plots represent a 95% confidence interval (color figure online)
Significant SNPs on chromosomes 15, 19 and 20 associated with protein and oil contents (g kg−1 seed) from multi-locus mixed model and multi-trait mixed model-opposite analyses
*, suggestive threshold (25%); **, suggestive threshold (10%); ***, genome-wide significance threshold (5%)
aLinkage disequilibrium (LD) blocks were constructed based on four-gamete method. Blocks were merged, if adjacent blocks were separated by < 10 kb
bWilliams 82
cMinor allele frequency
dOHW14, Wooster, OH, 2014; OHW15, Wooster, OH, 2015; OHC15, Columbus, OH, 2015; IL15, Urbana, IL, 2015; NC15, Plymouth, NC, 2015; ALL, all environments
eAllelic effect of alternative allele relative to Williams 82
fLocus associated with an opposite effect for protein and oil identified by multi-trait mixed model
gSNP was not in LD with any other SNPS; thus, LD block was defined by positions of adjacent markers
Trait-specific significant SNPs identified by multi-trait mixed model
*, suggestive threshold (25%); **, suggestive threshold (10%); ***, genome-wide significance threshold (5%)
aLinkage disequilibrium (LD) blocks were constructed based on four-gamete method. Blocks were merged, if adjacent blocks were separated by < 10 kb
bWilliams 82
cMinor allele frequency
dOHW14, Wooster, OH, 2014; OHW15, Wooster, OH, 2015; OHC15, Columbus, OH, 2015; IL15, Urbana, IL, 2015; NC15, Plymouth, NC, 2015; ALL, all environments
eAllelic effect of alternative allele relative to Williams 82
Fig. 4The 40–42.5 Mb region on Chr 5 covering significantly associated trait-specific SNPs identified by multi-trait mixed model. Negative log10P-values of for the Illinois 2015 environment (IL15) are plotted against physical genomic position (Glyma.Wm82.a2.v1). Horizontal lines are as described in Fig. 3. Previously identified QTL are indicated with horizontal arrows and were obtained from SoyBase (http://soybase.org)
Fig. 5The 43–48 Mb region on Chr 10 covering significantly associated trait-specific SNPs identified by multi-trait mixed model. Negative log10P-values of for the Wooster, Ohio 2014 environment (OHW14) are plotted against physical position (Glyma.Wm82.a2.v1). Horizontal lines and arrows are as described in Fig. 4. The maturity gene (E2) indicated by the vertical line is coincident with these significant markers
Fig. 6Manhattan plots (left) and QQ-plots (right) for genome-wide association study of the 621 soybean accessions using multi-locus mixed model for methionine (a), cysteine (b), lysine (c) and threonine (d) on a g kg cp−1 basis across all environments (ALL). Horizontal lines, markers and shading are as described in Fig. 3
Linkage disequilibrium (LD) blocks significantly associated with amino acid content (g kg−1 crude protein) by multi-locus mixed model
| Chr | LD block position rangea | Most significant SNP/LD block | Position | Trait environment | − log10( |
|---|---|---|---|---|---|
| 1 | 1237296–1314722 | ss715578474 | 1309778 | Met-OHC15 | 6.86*** |
| Met-ALL | 6.26*** | ||||
| Met-IL15 | 5.37** | ||||
| 3 | 42759210–42819489 | ss715586331 | 42783646 | Cys-IL15 | 5.72*** |
| 9 | 41468783–41529869 | ss715604076 | 41499208 | Thr-OHC15 | 5.66*** |
| 10 | 45250482–45546527 | ss715607486 | 45325872 | Thr-NC15 | 6.40*** |
| 11 | 5865872–5987980 | ss715610921 | 5886407 | Thr-IL15 | 5.49*** |
| Thr-NC15 | 5.08* | ||||
| Thr-ALL | 5.03* | ||||
| 15 | 3863922–3985288 | ss715621799 | 3936757 | Met-ALL | 6.30*** |
| Met-OHC15 | 5.74*** | ||||
| Met-NC15 | 4.78* | ||||
| 18 | 4886585–4996669 | ss715631030 | 4953024 | Met-OHC15 | 5.44*** |
| Met-ALL | 4.74* | ||||
| 20 | 31554795–32384035 | ss715637294 | 32282623 | Thr-OHW14 | 10.88*** |
| Lys-OHW14 | 10.25*** | ||||
| Thr-ALL | 5.03* | ||||
| Lys-ALL | 4.95* |
*, suggestive threshold (25%); **, suggestive threshold (10%); ***, genome-wide significance threshold (5%)
aLD blocks were constructed based on four-gamete method. Blocks were merged, if adjacent blocks were separated by < 10 kb
Haplotypes of QTL for protein and oil contents by maturity group on Chrs 5, 10, 15 and 20 (color table online)
Haplotypes were classified only with soybean accessions having SoySNP50K data (SoyBase, http://soyase.org.snps/) and maturity group information (GRIN, http://www.ars-grin.gov/cgi-bin/npgs/html/crop.pl?51)
*Significant SNPs at 5% as a genome-wide significance threshold
aAverage seed protein or oil content (%) of individuals with this haplotype from the 621 Plant Introductions (PIs) minus the average of all 621 PIs for the ALL environment (average of 621 PIs: 37.60% Protein, 16.63% Oil at a 13% moisture basis)
bHaplotype contributing a positive effect as determined by MTMM analysis
cHaplotypes were classified only with soybean accessions having SoySNP50K data (SoyBase, http://soyase.org.snps/) and maturity group information (GRIN, http://www.ars-grin.gov/cgi-bin/npgs/html/crop.pl?51)
dHaplotype contributing a positive effect for seed protein content by MLMM analysis
Fig. 7Distribution of haplotypes of trait-specific QTL for protein and oil on Chr 5 (a) and Chr 10 (b) and QTL for protein and oil on Chr 15 (c) and Chr 20 (d). The frequency of each haplotype, illustrated in pie charts, was placed according to the geographic locations of major populations from Russia, Asia and North America. Size of pie chart is correlated to the number of accessions in the region. Haplotypes are as described in Table 5. The figure map was created using the R package ‘maps’ and ‘mapdata’ in the R project