| Literature DB >> 35968122 |
Yia Yang1, Thang C La1, Jason D Gillman2, Zhen Lyu3, Trupti Joshi4, Mariola Usovsky1, Qijian Song5, Andrew Scaboo1.
Abstract
Modern soybean [Glycine max (L.) Merr] cultivars have low overall genetic variation due to repeated bottleneck events that arose during domestication and from selection strategies typical of many soybean breeding programs. In both public and private soybean breeding programs, the introgression of wild soybean (Glycine soja Siebold and Zucc.) alleles is a viable option to increase genetic diversity and identify new sources for traits of value. The objectives of our study were to examine the genetic architecture responsible for seed protein and oil using a recombinant inbred line (RIL) population derived from hybridizing a G. max line ('Osage') with a G. soja accession (PI 593983). Linkage mapping identified a total of seven significant quantitative trait loci on chromosomes 14 and 20 for seed protein and on chromosome 8 for seed oil with LOD scores ranging from 5.3 to 31.7 for seed protein content and from 9.8 to 25.9 for seed oil content. We analyzed 3,015 single F4:9 soybean plants to develop two residual heterozygotes derived near isogenic lines (RHD-NIL) populations by targeting nine SNP markers from genotype-by-sequencing, which corresponded to two novel quantitative trait loci (QTL) derived from G. soja: one for a novel seed oil QTL on chromosome 8 and another for a novel protein QTL on chromosome 14. Single marker analysis and linkage analysis using 50 RHD-NILs validated the chromosome 14 protein QTL, and whole genome sequencing of RHD-NILs allowed us to reduce the QTL interval from ∼16.5 to ∼4.6 Mbp. We identified two genomic regions based on recombination events which had significant increases of 0.65 and 0.72% in seed protein content without a significant decrease in seed oil content. A new Kompetitive allele-specific polymerase chain reaction (KASP) assay, which will be useful for introgression of this trait into modern elite G. max cultivars, was developed in one region. Within the significantly associated genomic regions, a total of eight genes are considered as candidate genes, based on the presence of gene annotations associated with the protein or amino acid metabolism/movement. Our results provide better insights into utilizing wild soybean as a source of genetic diversity for soybean cultivar improvement utilizing native traits.Entities:
Keywords: Glycine soja; QTL; seed oil; seed protein; wild soybean (Glycine soja Sieb. and Zucc.)
Year: 2022 PMID: 35968122 PMCID: PMC9372550 DOI: 10.3389/fpls.2022.938100
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
FIGURE 6Performance of developed KASP assay ED-5 for detection of protein QTL on chr.14. Endpoint fluorescence scattering plots of (A) RHD-NIL population, and (B) BC1F1 breeding population. Allele-specific HEX primer (mutant; MUT) was displayed in green, allele-specific FAM primer (wild type; WT) was displayed in blue, and heterozygous (HET) lines were marked in red. The X-axis displays fluorescence of FAM at 523–568 nm, and the Y-axis displays fluorescence of HEX at 483–533 nm. Molecular marker assay for Gm14_8059955 is displayed above containing the forward FAM allele X wild type (WT), forward HEX allele Y mutant (MUT), and the reverse primer.
FIGURE 1Quantitative trait loci (QTL) LOD score traces for seed oil (green) and protein (blue) content in a population of Osage × PI593983 across four environments during 2016 and 2017 in Missouri. The vertical axis indicates the genetic map position along the chromosome. The horizontal axis represents the logarithm of the odds (LOD) score. The black dotted line indicates the threshold of significance (LOD = 6.0). (A) Chr8. (B) Chr14. (C) Chr20.
Genetic map distribution of GSB markers for the ‘Osage’ × PI 593983 RIL population.
| Number of Markers | Length (cM) | Average spacing (cM) | Max spacing (cM) | ||||
| Chr | After | Parental | Follow the rule of | After removal | |||
| 1 | 1,285 | 421 | 44 | 39 | 63.1 | 1.7 | 23.4 |
| 2 | 1,376 | 558 | 275 | 270 | 159.3 | 0.6 | 60.5 |
| 3A | 365 | 8.4 | 0.4 | 1.8 | |||
| 3B | 1,707 | 909 | 370 | 195 | 53.1 | 0.3 | 4.9 |
| 3C | 146 | 47 | 0.3 | 5.1 | |||
| 4 | 1,464 | 557 | 264 | 210 | 74.5 | 0.4 | 8.5 |
| 5 | 1,198 | 486 | 231 | 216 | 111 | 0.5 | 20.1 |
| 6 | 1,249 | 392 | 165 | 156 | 57.2 | 0.4 | 3 |
| 7 | 1,207 | 351 | 143 | 137 | 95.5 | 0.7 | 24.3 |
| 8 | 1,565 | 650 | 241 | 237 | 143.9 | 0.6 | 67.6 |
| 9 | 1,209 | 368 | 14 | 10 | 61.5 | 6.8 | 31 |
| 10 | 1,219 | 362 | 180 | 177 | 113 | 0.6 | 34.2 |
| 11 | 1,357 | 614 | 294 | 253 | 123.9 | 0.5 | 16 |
| 12 | 1,329 | 685 | 331 | 312 | 102.3 | 0.3 | 6.4 |
| 13A | 1,386 | 514 | 220 | 113 | 32.3 | 0.3 | 1.9 |
| 13B | 82 | 29.7 | 0.4 | 2.5 | |||
| 14 | 1,973 | 1,245 | 546 | 522 | 177 | 0.3 | 7.8 |
| 15 | 981 | 182 | 55 | 56 | 57.8 | 1.1 | 19 |
| 16 | 1,387 | 693 | 314 | 284 | 123.4 | 0.4 | 6.8 |
| 17 | 1,110 | 394 | 221 | 212 | 75.8 | 0.4 | 3.2 |
| 18 | 1,391 | 397 | 178 | 166 | 99.9 | 0.6 | 59.5 |
| 19 | 883 | 75 | 27 | 22 | 67.9 | 3.2 | 50.5 |
| 20 | 1,972 | 1,172 | 539 | 535 | 173.6 | 0.3 | 6 |
| Overall | 27,248 | 11,025 | 4,652 | 4,374 | 2,051.2 | ||
Descriptive statistic of minimum, maximum, means, standard deviation (SD), coefficient of variation (CV), skewness, kurtosis, and least-square means of seed oil and protein between environments.
| Traits | Environment | Min | Max | Mean |
| CV (%) | Skewness | Kurtosis | Groups |
| Oil | 18/19GH | 20.1 | 22.1 | 21.1 | 0.51 | 2.41 | 0.08 | –0.80 | a |
| 19CLM | 17.8 | 19.8 | 18.9 | 0.45 | 2.41 | –0.05 | –0.27 | b | |
| 19NOV | 17.8 | 20.3 | 19.0 | 0.59 | 3.10 | 0.16 | –0.39 | b | |
| 20CLM | 16.2 | 17.9 | 17.2 | 0.43 | 2.49 | –0.32 | –0.48 | c | |
| 20NOV | 16.2 | 18.3 | 17.1 | 0.48 | 2.83 | 0.27 | –0.45 | c | |
| CLM&NOV | 17.4 | 18.9 | 18.0 | 0.36 | 1.98 | 0.18 | –0.51 | ||
| Combined | 18.0 | 19.5 | 18.6 | 0.34 | 1.82 | 0.27 | –0.14 | ||
| ———————————————————————————————————————————————————————————————————————————– | |||||||||
| Protein | 18/19GH | 38.2 | 44.8 | 41.4 | 1.47 | 3.55 | –0.10 | –0.12 | d |
| 19CLM | 42.8 | 45.7 | 44.1 | 0.83 | 1.87 | 0.23 | –1.03 | a | |
| 19NOV | 41.4 | 46.6 | 43.8 | 0.96 | 2.20 | 0.50 | 1.31 | bc | |
| 20CLM | 42.9 | 45.2 | 44.0 | 0.60 | 1.36 | –0.11 | –0.80 | ab | |
| 20NOV | 42.7 | 45.4 | 43.8 | 0.73 | 1.66 | 0.46 | –0.53 | c | |
| CLM&NOV | 42.8 | 45.2 | 44.0 | 0.60 | 1.36 | 0.23 | –0.83 | ||
| Combined | 43.0 | 44.9 | 43.4 | 0.69 | 1.59 | 0.11 | –0.81 | ||
aFive environments: 2018/2019 greenhouse (18/19GH), 2019 Columbia (19CLM), 2019 Novelty (19NOV), 2020 Columbia (20CLM), and 2020 Novelty (20NOV).
bCombined seed oil and protein content from four field environments (19CLM, 19NOV, 20CLM, and 20NOV).
cCombined seed oil and protein content from five environments (18/19GH, 19CLM, 19NOV, 20CLM, and 20NOV).
dGrouping of least square means.
Analysis of variance summary for seed protein and seed oil with heritability (h2) on an entry-mean basis.
| Source of variance | Df | Mean Sq | Pr( > F) | Mean Sq | Pr( > F) | ||
| ———– Protein ———– | ———— Oil ————— | ||||||
| Genotype (G) | 49 | 2.27 | 4.91 | 9.99E-15 | 0.81 | 4.21 | 3.23E-12 |
| Environment (E) | 3 | 1.86 | 4.03 | 8.46E-03 | 92.68 | 480.91 | < 2.22E15 |
| Genotype × Environment (GxE) | 139 | 0.64 | 1.38 | 2.40E-02 | 0.25 | 1.28 | 6.61E-02 |
| Replications in Environment | 4 | 1.66 | 3.61 | 7.15E-03 | 0.07 | 0.37 | 8.29E-01 |
| Residual | 161 | 0.46 | 0.19 | ||||
|
| 0.72 | 0.69 | |||||
*Indicates significance at the 0.05 level (P < 0.05).
**Indicates significance at the 0.01 level (P < 0.01).
***Indicates significance at the 0.001 level (P < 0.001).
Pearson correlation coefficient between seed oil and protein in the high protein RHD-NIL population across multiple environments.
| 18/19GH | 19CLM | 19NOV | 20CLM | 20NOV | CLM&NOV | Combined | |||||||||
| Environment | Trait | Oil | Protein | Oil | Protein | Oil | Protein | Oil | Protein | Oil | Protein | Oil | Protein | Oil | Protein |
| 18/19GH | Oil | 1 | |||||||||||||
| Protein | –0.75 | 1 | |||||||||||||
| 19CLM | Oil | 0.46 | –0.49 | 1 | |||||||||||
| Protein | –0.48 | 0.45 | –0.77 | 1 | |||||||||||
| 19NOV | Oil | 0.27ns | –0.38 | 0.29ns | –0.29ns | 1 | |||||||||
| Protein | –0.41 | 0.48 | –0.28ns | 0.38 | –0.78 | 1 | |||||||||
| 20CLM | Oil | 0.42 | –0.38 | 0.55 | –0.39 | 0.32 | –0.31 | 1 | |||||||
| Protein | –0.40 | 0.54 | –0.53 | 0.53 | –0.27ns | 0.39 | –0.55 | 1 | |||||||
| 20NOV | Oil | 0.57 | –0.58 | 0.54 | –0.50 | 0.25ns | –0.37 | 0.39 | –0.70 | 1 | |||||
| Protein | –0.51 | 0.49 | –0.54 | 0.60 | –0.22ns | 0.36 | –0.34 | 0.55 | –0.70 | 1 | |||||
| CLM&NOV | Oil | 0.58 | –0.62 | 0.79 | –0.65 | 0.68 | –0.63 | 0.74 | –0.68 | 0.73 | –0.60 | 1 | |||
| Protein | –0.59 | 0.63 | –0.68 | 0.81 | –0.55 | 0.73 | –0.50 | 0.75 | –0.70 | 0.79 | –0.83 | 1 | |||
| Combined | Oil | 0.74 | –0.66 | 0.79 | –0.65 | 0.51 | –0.52 | 0.76 | –0.67 | 0.74 | –0.60 | 0.94 | –0.78 | 1 | |
| Protein | –0.72 | 0.83 | –0.69 | 0.75 | –0.50 | 0.66 | –0.52 | 0.76 | –0.74 | 0.76 | –0.83 | 0.94 | –0.85 | 1 | |
The red color indicates highly correlated and the white color not correlated in the heatmap in the upper right corner.
aCombined data from four field study environments.
bCombined data from five environments.
*Indicates significance at the 0.05 level (P < 0.05).
**Indicates significance at the 0.01 level (P < 0.01).
***Indicates significance at the 0.001 level (P < 0.001).
FIGURE 2Genetic similarity test between individual RHD-NIL and parental lines is shown as a heatmap. Red indicates 1.0 genetically similar, light red indicates 0.90 genetically similar, and light pink indicates less than 0.50 genetically similar. Osage represents parent one and PI 593983 represents parent two.
FIGURE 3Distribution of markers across the Chr.14 protein QTL on the physical map. (A) Five genotyping-by-sequencing (GBS) markers in the initial RIL population. (B) Fifty-one SoySNP6K markers in the RHD-NIL population. (C) Eight WGR markers in the RHD-NIL population. The eight recombination regions are indicated on the physical map.
The eight recombination regions for seed protein and oil on Chr. 14.
| Trait | Chr | Recomb. Region | Marker Interval | Position (cM) | ||
| Protein content | 14 | Region-1 | Gm14_5509372-Gm14_6485179 | 0.00 | 10.47 | 0.39 |
| Region-2 | Gm14_6487608-Gm14_7138691 | 3.32 | 13.13 | 0.89 | ||
|
|
|
|
|
| ||
| Region-4 | Gm14_7455192-Gm14_8048870 | 16.06 | 14.65 | 0.07 | ||
|
|
|
|
|
| ||
|
|
|
|
|
| ||
| Region-7 | Gm14_12655776-Gm14_14976378 | 36.20 | 17.99 | 2.02 | ||
| Region-8 | Gm14_14976378-Gm14_44140803 | 37.26 | 16.75 | 0.49 | ||
| Oil content | 14 | Region-1 | Gm14_5509372-Gm14_6485179 | 0.00 | 3.56 | 2.77 |
| Region-2 | Gm14_6487608-Gm14_7138691 | 3.32 | 2.89 | 0.21 | ||
| Region-3 | Gm14_7141628-Gm14_7453099 | 5.46 | 2.94 | 0.47 | ||
| Region-4 | Gm14_7455192-Gm14_8048870 | 16.06 | 2.82 | 0.51 | ||
| Region-5 | Gm14_8059955-Gm14_9506311 | 23.41 | 3.60 | 1.24 | ||
| Region-6 | Gm14_9508613-Gm14_12648760 | 25.57 | 4.51 | 1.93 | ||
| Region-7 | Gm14_12655776-Gm14_14976378 | 36.20 | 4.81 | 2.57 | ||
| Region-8 | Gm14_14976378-Gm14_44140803 | 37.26 | 4.23 | 0.70 |
aChromosome number.
bName of recombination regions for protein and oil.
cMarker interval of the recombination regions; Gm14 represents Chr 14 and the follow number represents the physical position.
dVariation explained for protien and oil (R2) in percentage.
eANOVA F-value.
*Indicates significant at the 0.1 level (P < 0.1).
**Indicates significant at the 0.05 level (P < 0.05).
***Indicates significant at the 0.05 level (P < 0.01).
FIGURE 4Individual RHD-NILs and their eight recombination regions with the physical start and stop positions, protein and oil content, and the mean with the standard deviation (SD) based on their t-test grouping. F-value for the protein and oil content and the number of Wm82.a2.v1 annotated genes are displayed at the bottom. G. max are white, G. soja are dark gray, and the region containing recombination is light gray. Regions 5 and 6 are outlined in red to showcase the most significant regions for seed protein content.
FIGURE 5The differences in phenotypic values of protein content (%) and oil content (%) from CLM&NOV carrying different homozygous alleles for the markers Gm14_8059955 and Gm14_9508613. Allele (CC) is the allele from G. max (Osage) and (TT) is the allele from G. soja (PI 593983) in Gm14_8059955. The alleles in Gm14_9508613 are (TT) for G. max (Osage) and (GG) for G. soja (PI 593983). (A) Protein content for Gm14_8059955. (B) Protein content for Gm14_9508613. (C) Oil content for Gm14_8059955. (D) Oil content for Gm14_9508613. The whiskers represent the maximum and minimum values, the box displays the 25th and 75th percentile, and the line in the box is the median value. The dots represent the density of the phenotypic values.
Candidate protein-related genes within regions 5 and 6.
| Gmax 2.0 Gene IDs | Start | Stop | Biological Process Descriptions | KOG Annotations | Region |
| Glyma.14G090200 | 8218662 | 8222883 | Amino acid transport | Amino acid transporter protein | Region-5 |
| Glyma.14G096200 | 9006426 | 9009129 | Amino acid transport | NA | Region-5 |
| Glyma.14G096600 | 9045982 | 9049959 | Amino acid transport | Beta-fructofuranosidase (invertase) | Region-5 |
| Glyma.14G098100 | 9259148 | 9266365 | Cellular modified amino acid biosynthesis | NA | Region-5 |
| Glyma.14G102700 | 10194805 | 10198050 | Aromatic amino acid family biosynthetic process | Chorismate mutase | Region-6 |
| Glyma.14G104800 | 10748114 | 10751676 | Regulation of amino acid import | NA | Region-6 |
| Glyma.14G105200 | 10798891 | 10799849 | Regulation of amino acid export | NA | Region-6 |
| Glyma.14G105900 | 10916033 | 10919283 | Amino acid transport | NA | Region-6 |
aGene ID based on Wm82.a2.v1 assemblies.
bEuKaryotic Orthologous Groups (KOG) gene descriptions.