| Literature DB >> 31775326 |
Tengfei Zhang1, Tingting Wu1, Liwei Wang1, Bingjun Jiang1, Caixin Zhen1, Shan Yuan1, Wensheng Hou1,2, Cunxiang Wu1, Tianfu Han1, Shi Sun1.
Abstract
Soybean is an excellent source of vegetable protein and edible oil. Understanding the genetic basis of protein and oil content will improve the breeding programs for soybean. Linkage analysis and genome-wide association study (GWAS) tools were combined to detect quantitative trait loci (QTL) that are associated with protein and oil content in soybean. Three hundred and eight recombinant inbred lines (RILs) containing 3454 single nucleotide polymorphism (SNP) markers and 200 soybean accessions, including 94,462 SNPs and indels, were applied to identify QTL intervals and significant SNP loci. Intervals on chromosomes 1, 15, and 20 were correlated with both traits, and QTL qPro15-1, qPro20-1, and qOil5-1 reproducibly correlated with large phenotypic variations. SNP loci on chromosome 20 that overlapped with qPro20-1 were reproducibly connected to both traits by GWAS (p < 10-4). Twenty-five candidate genes with putative roles in protein and/or oil metabolisms within two regions (qPro15-1, qPro20-1) were identified, and eight of these genes showed differential expressions in parent lines during late reproductive growth stages, consistent with a role in controlling protein and oil content. The new well-defined QTL should significantly improve molecular breeding programs, and the identified candidate genes may help elucidate the mechanisms of protein and oil biosynthesis.Entities:
Keywords: candidate genes; genome-wide association study (GWAS); linkage analysis; oil content; protein content; quantitative trait loci (QTL); soybean
Mesh:
Substances:
Year: 2019 PMID: 31775326 PMCID: PMC6928826 DOI: 10.3390/ijms20235915
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Descriptive statistics and variance analysis for protein and oil content of two panels in multiple environments.
| Population | Trait | Environment a | Parents | Means (%) | Variance | Range (%) | CV c (%) | Skewness | Kurtosis | F Value of Variance Analysis |
| |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HH27 b (%) | ZGDD b (%) | Genotype (G) | Environment (E) | G*E | ||||||||||
| RILs g | Protein | 16SY | 38.55 | 45.33 | 41.54 ± 0.12 | 4.52 | 36.29~48.64 | 5.12 | 0.49 | 0.11 | 14.33 ***,f | 47.63 *** | 2.75 *** | 0.83 |
| 17SY | 38.94 | 45.17 | 41.89 ± 0.14 | 6.33 | 35.90~48.00 | 6.00 | 0.15 | −0.46 | ||||||
| 18SY | 40.15 | 45.01 | 41.93 ± 0.14 | 5.65 | 33.48~48.50 | 5.67 | 0.07 | 0.20 | ||||||
| 17XT | 44.65 | 43.68 | 41.58 ± 0.11 | 4.03 | 37.35~48.38 | 4.83 | 0.16 | −0.17 | ||||||
| 17XX | NA e | NA | 42.62 ± 0.12 | 4.14 | 37.53~47.29 | 4.77 | −0.10 | −0.34 | ||||||
| 18XX | 41.74 | 43.57 | 41.48 ± 0.17 | 5.14 | 33.19~47.23 | 5.46 | −0.18 | 0.72 | ||||||
| Oil | 16SY | 21.40 | 20.21 | 21.13 ± 0.06 | 1.17 | 18.13~24.38 | 5.11 | 0.08 | −0.08 | 18.27 *** | 1533.15 *** | 2.57 *** | 0.87 | |
| 17SY | 20.53 | 19.13 | 19.93 ± 0.07 | 1.47 | 16.12~23.17 | 6.09 | 0.00 | −0.18 | ||||||
| 18SY | 20.73 | 19.43 | 20.54 ± 0.07 | 1.37 | 17.49~24.26 | 5.69 | 0.16 | 0.04 | ||||||
| 17XT | 18.83 | 18.91 | 20.38 ± 0.06 | 1.18 | 16.76~23.14 | 5.32 | −0.20 | −0.14 | ||||||
| 17XX | NA | NA | 18.71 ± 0.06 | 1.12 | 16.19~21.44 | 5.65 | −0.04 | −0.49 | ||||||
| 18XX | 18.86 | 15.77 | 17.81 ± 0.09 | 1.39 | 15.13~20.90 | 6.62 | 0.17 | −0.09 | ||||||
| Accessions | Protein | 18SY | - | - | 42.02 ± 0.23 | 12.42 | 33.40~51.33 | 8.39 | 0.19 | −0.45 | 20.28 *** | 3.94 ** | 3.15 *** | 0.86 |
| 17XT | - | - | 42.21 ± 0.17 | 7.08 | 35.51~49.22 | 6.30 | −0.01 | −0.33 | ||||||
| 17XX | - | - | 42.10 ± 0.20 | 8.83 | 36.10~48.83 | 7.06 | 0.22 | −0.66 | ||||||
| 18XX | - | - | 42.46 ± 0.21 | 8.58 | 30.68~49.84 | 6.90 | −0.51 | 1.01 | ||||||
| Oil | 18SY | - | - | 20.73 ± 0.11 | 2.84 | 15.65~23.94 | 8.12 | −0.54 | −0.33 | 27.54 *** | 535.37 *** | 2.95 *** | 0.90 | |
| 17XT | - | - | 21.14 ± 0.09 | 2.04 | 17.78~25.35 | 6.75 | 0.10 | −0.25 | ||||||
| 17XX | - | - | 20.10 ± 0.10 | 2.31 | 16.14~23.53 | 7.56 | −0.28 | −0.48 | ||||||
| 18XX | - | - | 19.20 ± 0.12 | 2.63 | 15.29~22.93 | 8.44 | −0.03 | −0.54 | ||||||
a 16SY, 17SY, 18SY, 17XT, 17XX, 18XX—different environments of Sanya, Xiangtan, Xinxiang in 2016, 2017, and 2018. b HH27—Heihe27; ZGDD—Zigongdongdou. c CV—coefficient of variation. d H2—broad-sense heritability. e NA—not available. f ** p < 0.01; *** p < 0.001. g RILs—recombinant inbred lines.
Figure 1Histogram of recombinant inbred lines (RILs)’ protein content (a) and oil content (b) in six environments. 16SY, 17SY, 18SY, 17XT, 17XX, 18XX represent different environments of Sanya, Xiangtan, Xinxiang in 2016, 2017, and 2018. Zigongdongdou (ZGDD) and Heihe27 (HH27) are the parents of the RILs. Bars in different colors represent different content of protein/oil.
Co-detected QTL identified by linkage analysis in two-algorithm and/or multiple growth environments.
| Trait | QTL Name | Chr. (LG) a | Method b | Location (cM) | Marker Interval (cM) | Physical Region (bp) | LOD/F Value c | PVE (%) d | Additive Effect e | Environment f | Reference g |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Protein |
| 1 (D1a) | ICIM | 1 | 0~1.5 | 1488983~1566969 | 2.90~6.49 | 2.73~5.52 | −0.41~−0.49 | 1, 3 | Seed protein 3-4 |
| MCIM | 7.5 | 6.5~8.4 | 2605140~2852655 | 5.00 | - | −0.24 | - | ||||
|
| 6 (C2) | ICIM | 33 | 32.5~33.5 | 5836780~5931027 | 4.93 | 4.17 | 0.42 | 1 | cqSeed protein-005, Seed protein 30-5 | |
| MCIM | 32.1 | 31.7~32.2 | 5609477~5632020 | 5.10 | - | 0.26 | - | ||||
|
| 9 (K) | ICIM | 62~68 | 59.5 ~70.5 | 38117239~41020511 | 3.63~8.96 | 3.43~8.40 | 0.38~0.41 | 1, 2, 4 | Seed protein 33-3, Seed protein 34-6 | |
| MCIM | 61.7 | 60.7~62.7 | 38117239~39894925 | 10.40 | - | 0.42 | - | ||||
|
| 12 (H) | ICIM | 105~107 | 103.5~107 | 38776571~39867556 | 3.64~3.78 | 3.05~3.58 | 0.36~0.46 | 1, 3 | Seed protein 6-1 | |
|
| 15 (E) | ICIM | 23~26 | 22.5~26.5 | 2691560~3476238 | 9.00~19.05 | 13.40~17.81 | 0.79~0.89 | 3, 4, 5 | Seed protein 30-3 | |
| MCIM | 26.2 | 26.1~26.3 | 3311604~3350307 | 26.20 | - | 0.52 | - | ||||
|
| 18 (G) | ICIM | 22 | 21.5~22.5 | 5577815~5618246 | 6.06 | 5.80 | −0.59 | 3 | Seed protein 47-6 | |
| MCIM | 22.3 | 22.0~23.3 | 5618246~5979842 | 4.80 | - | −0.24 | - | ||||
|
| 20 (I) | ICIM | 54~61 | 48.5~62.5 | 34734798~37115770 | 6.14~8.62 | 7.24~9.39 | 0.56~0.75 | 1, 2, 3 | Seed protein 26-5, Seed protein 34-11 | |
| MCIM | 58.7 | 57.7~59.7 | 36089907~37115770 | 15.30 | - | 0.34 | - | ||||
| Oil |
| 1 (D1a) | ICIM | 1~10 | 0~14.5 | 1488983~3316074 | 2.85~2.99 | 1.56~1.67 | 0.14~0.17 | 1, 3 | Seed oil 23-2 |
|
| 2 (D1b) | ICIM | 121 | 116.5~126.5 | 43783867~45442501 | 2.56 | 3.64 | 0.19 | 5 | cqSeed oil-014, Seed oil 39-6 | |
| MCIM | 112.8 | 111.8~113.1 | 42545649~43226016 | 5.40 | - | 0.14 | - | ||||
|
| 3 (N) | ICIM | 52~54 | 50.5 ~55.5 | 33430615~34447425 | 2.62~2.72 | 1.86~2.78 | 0.15~0.21 | 2, 4 | Seed oil 43-30 | |
|
| 5 (A1) | ICIM | 117~126 | 116.5~126 | 40003403~41813079 | 3.89~35.35 | 7.04~23.98 | −0.27~−0.63 | 1, 2, 3, 4, 5, 6 | Seed oil 39-1, Seed oil 35-2, Seed oil 13-1 | |
| MCIM | 125.9 | 124.9~126.4 | 40566361~41813079 | 25.10 | - | −0.40 | - | ||||
|
| 6 (C2) | ICIM | 50~52 | 44.5~52.5 | 8313637~9652882 | 3.24~4.09 | 4.16~6.68 | −0.26~-0.34 | 2, 6 | cqSeed oil-016 | |
|
| 15 (E) | ICIM | 26 | 25.5~26.5 | 2691560~3240013 | 19.25 | 15.97 | −0.44 | 4 | cqSeed oil-007, Seed oil 2-3 | |
| MCIM | 26.2 | 26.1~26.3 | 3311604~3350307 | 28.40 | - | −0.28 | - | ||||
|
| 17 (D2) | ICIM | 41 | 39.5~41.5 | 7100839~8674575 | 3.19 | 1.72 | 0.18 | 3 | Seed oil 23-3 | |
| MCIM | 45.1 | 44.1~46.1 | 7453724~9120650 | 5.30 | - | 0.13 | - | ||||
|
| 20 (I) | ICIM | 56~62 | 51.5~62.5 | 34734798~37115770 | 3.93~5.20 | 2.30~2.87 | −0.20 | 1, 3 | Seed oil 27-4, Seed oil 24-6 |
a Chr. (LG), chromosome (linkage group). b inclusive composite interval mapping (ICIM) and a mixed model based on composite interval mapping (MCIM) were used. c limit of detection (LOD) value was the threshold by ICIM, and F-value was the threshold by MCIM, respectively, with the critical threshold value LOD = 2.5 and F = 4.7, respectively. d PVE, explanation of phenotypic variation. e Positive value means the ZGDD allele contributed to the trait. f 1, 2, 3, 4, 5, 6 represented 16SY, 17SY, 18SY, 17XT, 17XX, 18XX, respectively. g Reported quantitative trait loci (QTL) in Soybase databse (https://www.soybase.org/) that overlapped our QTL here.
Figure 2Location of quantitative trait loci (QTL) related to protein and oil contents. QTL in red color were protein while in blue were oil.
Figure 3The principal component (PC) analysis (a,b), population structure analysis (c), heat map of the kinship matrix of the 203 soybean accessions (d), and linkage disequilibrium (LD) decay (e) of the association panel.
Co-detected SNP loci regions by linkage analysis and GWAS.
| Chr. a | Trait | Method b | Environment c | Markers Interval (cM)/SNP Number d | SNP Loci Region/Location (bp) | LOD/F Value e | PVE (%) f | Additive Effect g |
|---|---|---|---|---|---|---|---|---|
| 2 | Oil | ICIM | 5 | 116.5~126.5 | 43783867~45442501 | 2.56 | 3.64 | 0.19 |
| GWAS | 4 | 1 | 45017225 | - | - | - | ||
| 6 | Protein | ICIM | 1 | 32.5~33.5 | 5836780~5931027 | 4.93 | 4.17 | 0.42 |
| MCIM | - | 31.7~32.2 | 5609477~5632020 | 5.10 | - | 0.26 | ||
| Oil | GWAS | 3, 5 | 2 | 5713084~5992538 | - | - | - | |
| 9 | Protein | ICIM | 1, 2, 4 | 59.5~70.5 | 38117239~41020511 | 3.63~8.96 | 3.43~8.40 | 0.38~0.41 |
| MCIM | - | 60.7~62.7 | 38117239~39894925 | 10.40 | - | 0.42 | ||
| Oil | GWAS | 6 | 1 | 40301013 | - | - | - | |
| 20 | Protein | ICIM | 1, 2, 3 | 48.5~62.5 | 34734798~37115770 | 6.14~8.62 | 7.24~9.39 | 0.56~0.75 |
| MCIM | - | 57.7~59.7 | 36089907~37115770 | 15.30 | - | 0.34 | ||
| GWAS | 5, 6 | 5 | 34990940~35578946 | - | - | - | ||
| Oil | ICIM | 1, 3 | 51.5~62.5 | 34734798~37115770 | 3.93~5.20 | 2.30~2.87 | −0.20 | |
| GWAS | 5 | 4 | 34801441~35512580 | - | - | - |
a Chr. chromosome. b Inclusive composite interval mapping (ICIM) and a mixed model based on composite interval mapping (MCIM) were two algorithms in linkage analysis. c 1, 2, 3, 4, 5, 6 represented 16SY, 17SY, 18SY, 17XT, 17XX, 18XX, respectively. d Markers interval is the QTL interval in linkage analysis, SNP number is the significant SNP loci number in the SNP loci region. e LOD value is the threshold by ICIM and F-value is the threshold by MCIM, respectively, with the critical threshold value LOD = 2.5 and F = 4.7, respectively. f PVE, explanation of phenotypic variation. g Positive value means ZGDD allele contributed to the trait.
Candidate genes that may control protein/oil content within the SNP regions on Chr. 15 and 20.
| Trait | Gene | Start (bp) | Stop (bp) | Annotation |
|---|---|---|---|---|
| Oil |
| 2722009 | 2727957 | acyltransferase activity, diacylglycerol and triacylglycerol biosynthesis |
|
| 2740960 | 2746344 | aldehyde dehydrogenase family, aldehyde dehydrogenase [NAD (P)+] activity | |
|
| 2765299 | 2770528 | drug transmembrane transport, associated with the transport of citric acid and malic acid | |
|
| 3339156 | 3341447 | fatty acid, lipid biosynthetic process, transferase activity, 3-oxoacyl-[acyl-carrier-protein] synthase activity | |
|
| 35025241 | 35029762 | Arabidopsis phospholipase-like protein, regulation of gene expression | |
|
| 35038552 | 35042864 | hydroxypyruvate reductase, glycerate dehydrogenase, glyoxylate reductase, NADP activity | |
|
| 35116048 | 35118928 | mitochondrial pyruvate transmembrane transport | |
|
| 35222837 | 35228540 | lipid metabolic process, steroid biosynthetic process, mevalonate pathway | |
|
| 35229423 | 35231861 | acetyltransferase activity | |
|
| 35315630 | 35319063 | fatty acid desaturase, lipid metabolic process | |
| Protein |
| 2656030 | 2657795 | structural constituent of ribosome, 28S ribosomal protein |
|
| 3068347 | 3075209 | 60S ribosomal protein | |
|
| 3164697 | 3168839 | ACT domain-containing protein, metabolic process like protein synthesis and degradation. | |
|
| 3255042 | 3256599 | ribosomal large subunit assembly, 60S ribosomal protein L23 | |
|
| 3307111 | 3308840 | structural constituent of ribosome, 60S ribosomal protein L35 | |
|
| 34605252 | 34609867 | tryptophan biosynthetic process, anthranilate synthase activity | |
|
| 34757381 | 34771672 | ACT-like protein, serine/threonine kinase family protein | |
|
| 34862155 | 34865242 | amino acid transmembrane transport | |
|
| 34962043 | 34967985 | translation initiation factor 3 (IF-3) family protein | |
|
| 35200934 | 35205885 | ubiquitin-dependent protein catabolic process, proteasome complex, proteolysis activity | |
|
| 35232344 | 35233758 | nutrient reservoir activity, cupins superfamily protein, storage protein | |
|
| 35261156 | 35268971 | ACT domain-containing protein, metabolic process like protein synthesis and degradation. | |
|
| 35396205 | 35400722 | cationic amino acid transporter, amino acid transmembrane transporter activity | |
| Protein/Oil |
| 34935548 | 34940516 | protein dephosphorylation, phosphatase activity, pyruvate dehydrogenase |
|
| 35235204 | 35239070 | lipoate biosynthetic, radical SAM superfamily protein, transferase activity |
ACT—Aspartate kinase, Chorismate mutase and TyrA (prephenate dehydrogenase), SAM—S-adenosyl methionine, NADP—nicotinamide adenine dinucleotide phosphate, IF—initiation factor.
Figure 4Relative expression patterns of candidate genes. Glyma.15g033200, Glyma.15g034100, Glyma.20g105300, Glyma.20g106900, and Glyma.20g107600 express as (a), Glyma.15g034600 and Glyma.20g103200 as (b), Glyma.15g040100 as (c). * p < 0.05, ** p < 0.01.