| Literature DB >> 26518570 |
Jiaoping Zhang1, Qijian Song2, Perry B Cregan2, Guo-Liang Jiang3.
Abstract
KEY MESSAGE: Twenty-two loci for soybean SW and candidate genes conditioning seed development were identified; and prediction accuracies of GS and MAS were estimated through cross-validation and validation with unrelated populations. Soybean (Glycine max) is a major crop for plant protein and oil production, and seed weight (SW) is important for yield and quality in food/vegetable uses of soybean. However, our knowledge of genes controlling SW remains limited. To better understand the molecular mechanism underlying the trait and explore marker-based breeding approaches, we conducted a genome-wide association study in a population of 309 soybean germplasm accessions using 31,045 single nucleotide polymorphisms (SNPs), and estimated the prediction accuracy of genomic selection (GS) and marker-assisted selection (MAS) for SW. Twenty-two loci of minor effect associated with SW were identified, including hotspots on Gm04 and Gm19. The mixed model containing these loci explained 83.4% of phenotypic variation. Candidate genes with Arabidopsis orthologs conditioning SW were also proposed. The prediction accuracies of GS and MAS by cross-validation were 0.75-0.87 and 0.62-0.75, respectively, depending on the number of SNPs used and the size of training population. GS also outperformed MAS when the validation was performed using unrelated panels across a wide range of maturities, with an average prediction accuracy of 0.74 versus 0.53. This study convincingly demonstrated that soybean SW is controlled by numerous minor-effect loci. It greatly enhances our understanding of the genetic basis of SW in soybean and facilitates the identification of genes controlling the trait. It also suggests that GS holds promise for accelerating soybean breeding progress. The results are helpful for genetic improvement and genomic prediction of yield in soybean.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26518570 PMCID: PMC4703630 DOI: 10.1007/s00122-015-2614-x
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Fig. 1Manhattan plot of GWAS for 100-seed weight (HSW) in soybean. Negative log10-transformed P values of SNPs from a genome-wide scan for HSW using mixed linear model including both kinship and populations structure are plotted against positions on each of the 20 chromosomes. The significant traits-associated SNPs (P < 7.9 × 10−5) are distinguished by the threshold line and colored in red (color figure online)
Loci and SNPs significantly associated with seed weight, predicted candidate genes and previously reported QTLs for seed weight at similar genome regions
| Loci | SNP IDa | Allelesb | MAF | Allelic effect | R2 |
| Known QTLsc | Candidate genesd | Annotation |
|---|---|---|---|---|---|---|---|---|---|
| SW1 | Gm04_11182315_A_G | G:A | 0.36 | 0.54 | 0.018 | 7.27E−05 | Seed weight 5-2 | ||
| SW2 | Gm04_14213918_C_T | T:C | 0.29 | 0.73 | 0.026 | 3.11E−06 | Seed weight 20-2, 36-15, seed length 1-14, seed height 1-13 and seed volume 1-10 | ||
| SW3 | Gm04_43653965_T_C | T:C | 0.12 | 0.73 | 0.021 | 2.35E−05 | |||
| SW4 | Gm04_47414790_G_A | A:G | 0.10 | 0.81 | 0.022 | 1.28E−05 | |||
| SW5 | Gm06_13151347_C_T | C:T | 0.08 | 0.89 | 0.018 | 6.71E−05 | |||
| SW6 | Gm06_15115808_C_T | T:C | 0.13 | 0.70 | 0.027 | 1.76E−06 | Seed weight 2-2 | ||
| SW7 | Gm06_15154965_T_C | C:T | 0.05 | 0.97 | 0.025 | 3.34E−06 | Seed weight 2-2 | ||
| SW8 | Gm07_15662403_C_T | T:C | 0.34 | −0.43 | 0.019 | 5.71E−05 | |||
| SW9 | Gm08_13444545_A_G | G:A | 0.06 | 0.75 | 0.018 | 7.26E−05 | Seed weight 34-13 and 35-1 | ||
| SW10 | Gm09_40164588_A_G | G:A | 0.10 | 0.72 | 0.019 | 4.98E−05 | Seed weight 2-7 and 27-3 | ||
| SW11 | Gm10_37088544_G_T | G:T | 0.06 | 1.29 | 0.038 | 1.84E−08 | Seed weight 27-2, 34-8, 25-4 and 37-6 | Glyma10g28250 | Transcription factor MYB61-like |
| SW12 | Gm12_4670638_A_C | C:A | 0.06 | 0.91 | 0.022 | 1.27E−05 | |||
| SW13 | Gm14_5352488_T_G | G:T | 0.08 | 0.78 | 0.029 | 7.35E−07 | Seed weight 36-14 | ||
| SW14 | Gm14_6072858_G_A | A:G | 0.12 | 0.72 | 0.031 | 2.58E−07 | Seed weight 36-14 | Glyma14g08050 | ARM repeat superfamily protein |
| SW15 | Gm14_6324126_G_A | A:G | 0.06 | 1.29 | 0.029 | 7.16E−07 | Seed weight 36-14 | Glyma14g08280 | Nucleotide-diphospho-sugar transferase family protein |
| SW16 | Gm15_13314408_G_A | A:G | 0.05 | 0.86 | 0.020 | 3.75E−05 | Seed weight 29-2 | ||
| SW17 | Gm18_59490788_A_G | G:A | 0.06 | 0.71 | 0.019 | 6.18E−05 | |||
| SW18 | Gm18_61540919_C_T | T:C | 0.08 | 0.89 | 0.023 | 1.11E−05 | |||
| SW19 | Gm19_41013395_C_T | C:T | 0.27 | 0.54 | 0.023 | 7.88E−06 | Seed weight 35-7, 15-7 | Glyma19g33421 | AHP protein |
| SW20 | Gm19_41144271_A_G | A:G | 0.05 | 0.98 | 0.025 | 2.89E−05 | Seed weight 35-7, 15-7 | Glyma19g33550 | Unknown |
| SW21 | Gm19_42921997_A_G | G:A | 0.43 | −0.53 | 0.026 | 2.11E−06 | Seed weight 5-1, 15-7, 17-1, 34-7, 35-7, 36-7. Seed volume 1-7, 1-8, seed length 1-10 and seed height 1-10,1-11, seed width 1-8 | Glyma19g35180 | AUX/IAA family protein |
| SW22 | Gm20_481573_G_A | A:G | 0.35 | −0.49 | 0.026 | 3.06E−06 | Seed weight 34-5 |
aStart with the version of Joint Genome Institute (JGI 1.01) G.max genome sequence followed by chromosome number, physical position of the marker on that chromosome and two alleles of the locus (Schmutz et al. 2010). The first of the two alleles for each locus is the Williams 82 alleles
bRespect to minor allele
cBased on the QTL list on SoyBase (http://www.soybase.org)
dGenes annotated in Glyma1.1, Glyma1.0, and NCBI RefSeq gene models in SoyBase (http://www.soybase.org) were used as the source of candidate genes
Fig. 2Candidate genes for loci associated with seed weight on Gm19 and phenotypic difference between different alleles of each locus. a SW20, b SW19 and c SW21. Top of the left panel shows a 0.5-Mb region each side of the lead SNP, whose position is indicated by a vertical blue dashed line. Negative log10-transformed P values of SNPs from the MLM are plotted on the vertical axis. Significant threshold is indicated as the gray dashed line at q = 0.05. The color of each SNP indicates its r 2 value with the lead SNP as shown in the color index on the right top of the panel. Bottom of the left panel shows putative genes within 50 kb adjacent region each side of the lead SNP as indicated by green bars. Candidate gene is indicated by arrow. The boxplot on the right shows the distribution of average 100-seed weight over four environments for each locus allele. The number of individual for each allele is given in the parenthesis. The box shows the first, second (median) and third quartile. The width of the box is proportional to the square root of the number of individuals for each allele. The whiskers extend to the 1.5 times of interquartile or the data extreme whichever is smaller. The difference of mean (Δm), correlation coefficient (r) and P value for the correlation is also given (color figure online)
Fig. 3Candidate genome ranges for seed weight loci SW11 and SW22. Shown are genome regions harboring SW11 on Gm10 (a) and SW22 on Gm20 (b). In the top panel, the negative log10-transformed P values of SNPs from GWAS for seed weight are plotted against the physical positions of the given chromosomal region. The bottom panel depicts the extent of LD in this region on r 2. The r 2 values are indicated with color key. The candidate region for the locus was indicated by two vertical dashed lines in gray. Genes within this region are indicated in the middle panel. Those with transcript accumulated during seed filling were highlighted in red and bold according to the “seed development transcript count” track on SoyBase (http://soybase.org/) (color figure online)
Fig. 4Prediction accuracies of genomic selection (GS) and marker-assisted selection (MAS) for the association panel and the panels obtained from GRIN. a, b The average prediction accuracies of 1000 iterations of GS and MAS for seed weight, respectively. The number of SNPs used for prediction was indicated in the legend. For GS with a subgroup of SNPs, an equal number of SNPs were randomly selected from each chromosome. For MAS, the prediction accuracies with 15 randomly selected SNPs (R15) were also plotted as a control. c The prediction accuracies of GS with the entire set of SNPs and MAS with the 15 selected trait-associated SNPs for seed weight of the four GRIN panels. The maturities of individuals involved in each panel were indicated in parenthesis
Lead SNPs of loci selected by stepwise method for MAS based on Akaike information criterion
| No. of SNPs | Selected SNPs for genetic breeding values prediction |
|---|---|
| 5 | Gm04_14213918, Gm14_5352488, Gm18_59490788, Gm19_41013395, Gm20_481573 |
| 10 | Gm04_14213918, Gm06_13151347, Gm09_40164588, Gm10_37088544, Gm14_5352488, Gm18_59490788, Gm19_41013395, Gm19_41144271, Gm19_42921997, Gm20_481573 |
| 15 | Gm04_14213918, Gm04_47414790, Gm06_13151347, Gm06_15115808, Gm07_15662403, Gm09_40164588, Gm10_37088544, Gm14_5352488, Gm14_6324126, Gm18_59490788, Gm18_61540919, Gm19_41013395, Gm19_41144271, Gm19_42921997, Gm20_481573 |
Allelic segregation of the 15 selected loci in the five plant introductions (PIs) with extreme seed weight (SW) in the association panela
aThe allele with positive effect on each locus is highlighted in red
bShown are the averages over 4 environments and 3 replications for each environment