| Literature DB >> 35807655 |
Suprio Ghosh1,2, Shengrui Zhang1, Muhammad Azam1, Kwadwo Gyapong Agyenim-Boateng1, Jie Qi1, Yue Feng1, Yecheng Li1, Jing Li1, Bin Li1, Junming Sun1.
Abstract
Soybean seeds are primary sources of natural tocopherols used by the food and pharmaceutical industries, owing to their beneficial impacts on human health. Selection for higher tocopherol contents in seeds along with other desirable traits is an important goal in soybean breeding. In order to identify the genomic loci and candidate genes controlling tocopherol content in soybean seeds, the bulked-segregant analysis technique was performed using a natural population of soybean consisting of 1525 accessions. We constructed the bulked-segregant analysis based on 98 soybean accessions that showed extreme phenotypic variation for the target trait, consisting of 49 accessions with extremely-high and 49 accessions with extremely-low tocopherol content. A total of 144 variant sites and 109 predicted genes related to tocopherol content were identified, in which a total of 83 genes were annotated by the gene ontology functions. Furthermore, 13 enriched terms (p < 0.05) were detected, with four of them found to be highly enriched: response to lipid, response to abscisic acid, transition metal ion transmembrane transporter activity, and double-stranded DNA binding. Especially, six candidate genes were detected at 41.8-41.9 Mb genomic hotspots on chromosome 5 based on ANNOtate VARiation analysis. Among the genes, only Glyma.05G243400 carried a non-synonymous mutation that encodes a "translation elongation factor EF1A or initiation factor IF2gamma family protein" was identified. The haplotype analysis confirmed that Glyma.05G243400 exhibited highly significant variations in terms of tocopherol content across multiple experimental locations, suggesting that it can be the key candidate gene regulating soybean seed tocopherols. The present findings provide novel gene resources related to seed tocopherols for further validation by genome editing, functional characterization, and genetic improvement targeting enhanced tocopherol composition in soybean molecular breeding.Entities:
Keywords: SNP-index; bulk segregant analysis (BSA); candidate genes; next-generation sequencing (NGS); soybean (Glycine max L. Merrill); tocopherols
Year: 2022 PMID: 35807655 PMCID: PMC9269242 DOI: 10.3390/plants11131703
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Summary of sequencing data quality.
| Sample a | Reference Genome | Number of Plants Bulked | Raw Bases (bp) | Clean Bases (bp) | Effective Rate (%) | Error Rate (%) | Q20 (%) | Q30 (%) | GC Content b (%) |
|---|---|---|---|---|---|---|---|---|---|
| VE-Low | Williams 82 | 49 | 57,765,692,100 | 57,645,759,600 | 99.79 | 0.03 | 97.66 | 93.37 | 36.25 |
| VE-High | Williams 82 | 49 | 53,145,146,700 | 53,070,136,800 | 99.86 | 0.03 | 97.41 | 92.77 | 36.52 |
a VE-Low, bulk DNA pool with low tocopherol content; VE-High, bulk DNA pool with high tocopherol content. b GC Content stands for guanine-cytosine content.
Sequencing depth and coverage statistics.
| Sample | Mapped Reads | Total Reads | Mapping Rate (%) | Average depth (X) | Coverage at Least 1× (%) | Coverage at Least 4× (%) |
|---|---|---|---|---|---|---|
| VE-High | 349,651,428 | 353,800,912 | 98.83 | 41.52 | 99.61 | 99.04 |
| VE-Low | 379,650,505 | 384,305,064 | 98.79 | 44.57 | 99.65 | 99.11 |
Note: VE-Low, bulk DNA pool with low tocopherol content; VE-High, bulk DNA pool with high tocopherol content.
Numbers of different types of SNPs and InDels detected between the two soybean bulks.
| Category | SNP | InDel |
|---|---|---|
| Intergenic | 2,701,815 | 404,485 |
| Upstream | 207,958 | 57,177 |
| Downstream | 180,387 | 44,948 |
| Upstream/Downstream | 10,174 | 2853 |
| Intronic | 326,872 | 69,637 |
| Stop-gain | 1614 | 126 |
| Stop-loss | 315 | 27 |
| Frameshift deletion | - | 1902 |
| Frameshift insertion | - | 1744 |
| Non-frameshift deletion Exonic | - | 1513 |
| Non-frameshift insertion | - | 1323 |
| Synonymous | 55,156 | - |
| Non-synonymous | 74,303 | - |
| Splicing | 774 | 266 |
| Insertion | - | 282,838 |
| Deletion | - | 304,169 |
| Transitions, conversions | 2,325,983 | - |
| Transversions, change | 1,233,385 | - |
| Conversion to transversal ratio | 1885 | - |
| Total | 3,559,368 | 587,007 |
Note: Intergenic, variant is in the intergenic region; Upstream, variant overlaps the 1-kb region upstream of the transcription start site; Downstream, variant overlaps the 1-kb region downstream of transcription end site; Upstream/downstream, variant overlaps the 1-kb region upstream of transcription start site for a gene, while it overlaps the 1-kb region downstream of transcription end site of another gene at the same time; Intronic, variant overlaps an intron; Exonic, variant overlaps a coding exon; Stop-gain, an insertion/deletion that leads to the immediate creation of stop codon at the variant site; Stop-loss, an insertion/deletion that leads to the immediate elimination of a stop codon at the variant site; Frameshift deletion, a deletion causing a frameshift; Frameshift insertion, an insertion leading to a frameshift; Non-frameshift deletion, a deletion causing no frameshift; Non-frameshift insertion, an insertion leading to no frameshift; Synonymous, a single-nucleotide variant that does not cause an amino acid change; Non-synonymous, a single-nucleotide variant that does cause an amino acid change; Splicing, variant is within 2 bp of a splicing junction.
Figure 1The positions of tocopherol-related loci and corresponding candidate genes on different chromosomes of soybean.
Putative candidate genes containing non-synonymous, stop-gain, and frameshift mutation variants.
| Position | Chromosome | Ref | Alt | Effect | Gene ID | Ortholog in Arabidopsis | Description |
|---|---|---|---|---|---|---|---|
| 39758623 | Chr05 | A | G | Nonsynonymous | Glyma.05G217500 | AT1G64450.1 | Glycine-rich protein family |
| 41807338 | Chr05 | T | G | Nonsynonymous | Glyma.05G243400 | AT1G18070.2 | Translation elongation factor EF1A/initiation factor IF2gamma family protein |
| 48428377 | Chr06 | C | T | Nonsynonymous | Glyma.06G295300 | AT4G28140.1 | Integrase-type DNA-binding superfamily protein |
| 17002951 | Chr08 | A | G | Nonsynonymous | Glyma.08G210000 | AT1G15420.1 | Repeat-containing protein |
| 14945609 | Chr13 | A | C | Nonsynonymous | Glyma.13G052400 | - | - |
| 46481282 | Chr14 | C | G | Nonsynonymous | Glyma.14G199800 | AT1G13960.1 | WRKY DNA-binding protein 4 |
| 35333785 | Chr16 | C | T | Nonsynonymous | Glyma.16G190900 | AT2G34930.1 | Disease resistance family protein/LRR family protein |
| 36350176 | Chr16 | C | G | Nonsynonymous | Glyma.16G202100 | AT4G05200.1 | Cysteine-rich RLK (Receptor-like protein kinase) 25 |
| 36970367 | Chr16 | G | C | Nonsynonymous | Glyma.16G210600 | AT5G36930.2 | Disease resistance protein (TIR-NBS-LRR class) family |
| 6309627 | Chr17 | C | T | Nonsynonymous | Glyma.17G081100 | AT2G03820.1 | Nonsense-mediated mRNA decay NMD3 family protein |
| 6333162 | Chr17 | A | T | Stop-gain | Glyma.17G081500 | - | - |
| 14819413 | Chr08 | GCAGT | - | Frameshift Deletion | Glyma.08G184700 | AT5G24750.1 | UDP-Glycosyltransferase superfamily protein |
| 43894622 | Chr15 | A | - | Frameshift Deletion | Glyma.15G233300 | AT1G55020.1 | Lipoxygenase 1; Linoleate 9S-lipoxygenase/Linoleate 9-lipoxygenase |
Note: Ref, Reference; Alt, Alternative.
Figure 2Variant association analyses for identification of the candidate regions related to tocopherol content in soybean based on BSA-seq of a panel of soybean accessions. (A) Visualization of the ∆(InDel index) and ∆(SNP−index) plots with statistical confidence intervals (p < 0.05). The dotted ash line represents SNP/InDel indices. The blue dots and red line represent ∆(SNP−index) and the sliding window average of ∆(SNP/InDel−index) calculated based on a 2 Mb interval with a 10kb sliding window. The green dotted line shows the association threshold value (0.90). (B) Candidate genes in the genomic locus (41.8 and 41.9) on chromosome 5.
Figure 3The protein structure (motifs) variants of Glyma.05G243400 between reference (WT) and mutated proteins using the PredictProtein software: https://predictprotein.org (accessed on 1 June 2022).
Figure 4Haplotype analysis of Glyma.05G243400 in the natural soybean population. Here, REF represents the reference allele (allele with more counts in the dataset and has the same identity as that of the reference genome), while ALT represents the alternate allele count (allele not already represented by the REF). A17, Anhui 2017; B17, Beijing 2017; B18, Beijing 2018; H17, Hainan 2017; H18, Hainan 2018.
Gene ontology annotations of candidate genes related to tocopherol content in soybean.
| GO ID | Term | Annotated | Count | Expected | Genes | |
|---|---|---|---|---|---|---|
| GO:0033993 | Response to lipid | 565 | 5 | 1.02 | 0.004 | Glyma.04G199900, Glyma.05G243800, Glyma.05G244100, Glyma.06G249800, Glyma.17G219400 |
| GO:0009737 | Response to abscisic acid | 437 | 4 | 0.79 | 0.008 | Glyma.05G243800, Glyma.05G244100, Glyma.06G249800, Glyma.17G219400 |
| GO:0097305 | Response to alcohol | 502 | 4 | 0.91 | 0.013 | Glyma.05G243800, Glyma.05G244100, Glyma.06G249800, Glyma.17G219400 |
| GO:1901700 | Response to oxygen-containing compound | 1104 | 6 | 2.00 | 0.014 | Glyma.03G173200, Glyma.04G199900, Glyma.05G243800, Glyma.05G244100, Glyma.06G249800, Glyma.17G219400 |
| GO:0001101 | Response to acid chemical | 813 | 5 | 1.47 | 0.016 | Glyma.04G199900, Glyma.05G243800, Glyma.05G244100, Glyma.06G249800, Glyma.17G219400 |
| GO:0000041 | Transition metal ion transport | 116 | 2 | 0.21 | 0.019 | Glyma.14G196200, Glyma.17G219400 |
| GO:0009845 | Seed germination | 117 | 2 | 0.21 | 0.019 | Glyma.05G243800, Glyma.05G244100 |
| GO:0090351 | Seedling development | 131 | 2 | 0.24 | 0.024 | Glyma.05G243800, Glyma.05G244100 |
| GO:0042221 | Response to chemical | 2380 | 9 | 4.31 | 0.026 | Glyma.03G173200, Glyma.04G199900, Glyma.05G183800, Glyma.05G243800, Glyma.05G244100, Glyma.06G249800, Glyma.08G169800, Glyma.17G219400, Glyma.18G293300 |
| GO:0071215 | Cellular response to abscisic acid stimulus | 192 | 2 | 0.35 | 0.047 | Glyma.05G243800, Glyma.06G249800 |
| GO:0009719 | Response to endogenous stimulus | 1480 | 6 | 2.68 | 0.050 | Glyma.03G173200, Glyma.04G199900, Glyma.05G243800, Glyma.05G244100, Glyma.06G249800, Glyma.17G219400 |
| GO:0046915 | Transition metal ion transmembrane transporter activity | 103 | 2 | 0.21 | 0.019 | Glyma.14G196200, Glyma.17G219400 |
| GO:0003690 | Double-stranded DNA binding | 150 | 2 | 0.31 | 0.038 | Glyma.01G042900, Glyma.08G306100 |