| Literature DB >> 35149770 |
Jialin Liu1, Huimin Xie1, Ting Lin1, Congxiao Tie1, Huolin Luo1, Boyun Yang1, Dongjin Xiong2.
Abstract
Soybean cultivars bred in the Huang-Huai-Hai region (HR) are rich in pedigree information. To date, few reports have exposed the genetic variants, population structure and genetic diversity of cultivars in this region by making use of genome-wide resequencing data. To depict genetic variation, population structure and composition characteristics of genetic diversity, a sample of soybean population composed all by cultivars was constructed. We re-sequenced 181 soybean cultivar genomes with an average depth of 10.38×. In total, 11,185,589 single nucleotide polymorphisms (SNPs) and 2,520,208 insertion-deletions (InDels) were identified on all 20 chromosomes. A considerable number of putative variants existed in important genome regions that may have an incalculable influence on genes, which participated in momentous biological processes. All 181 varieties were divided into five subpopulations according to their breeding years, SA (1963-1980), SB (1983-1988), SC (1991-2000), SD (2001-2011), SE (2012-2017). PCA and population structure figured out that there was no obvious grouping trend. The LD semi-decay distances of sub-population D and E were 182 kb, and 227 kb, respectively. Sub-population A (SA) had the highest value of nucleotide polymorphism (π). With the passage of time, the nucleotide polymorphism of SB and SC decreased gradually, however that of SD and SE, opposite to SB and SC, gave a rapid up-climbing trend, which meant a sharp increase in genetic diversity during the latest 20 years, hinting that breeders may have different breeding goals in different breeding periods in HR. Analysis of the PIC statistics exhibited very similar results with π. The current study is to analyze the genetic variants and characterize the structure and genetic diversity of soybean cultivars bred in different decades in HR, and to provide a theoretical reference for other identical studies.Entities:
Mesh:
Year: 2022 PMID: 35149770 PMCID: PMC8837640 DOI: 10.1038/s41598-022-06447-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Summary of sequencing, SNPs and InDels information on the genome. (A) Frequency distribution histogram of sequencing depth of 181 cultivars in HR. (B) Frequency distribution histogram of mapping ratio of 181 cultivars in HR. (C) Frequency distribution histogram of coverage ratio of 181 cultivars in HR. (D) SNP and InDel counts on every chromosome. (E) Percentage of SNPs on each soybean genome region. (F) Percentage of InDels on each soybean genome region. (G) Numbers of transition/transversion mutations. Note that the plots were generated using Microsoft Excel 2016.
Figure 2Gene ontology annotation plot for 1314 genes containing SNPs which were mutated in all varieties in HR. BP biological process, MF molecular function, CC cellular component. The x axis is the percentage of genes under a GO term to the total number of annotated genes. Note that the plot was generated using Microsoft Excel 2016.
Summary of the most relevant results from the GO enrichment analysis of the genes affected by non-synonymous SNPs.
| Description | Genes | Sequence ontology | Non-synonymous SNPs |
|---|---|---|---|
| Response to stimulus | Glyma.03g081100 | stop_gained | 1 |
| Metabolic process | Glyma.01g009500 | stop_gained | 2 |
| Glyma.03g081100 | stop_gained | 1 | |
| Glyma.14g209700 | stop_gained | 1 | |
| Glyma.20g015100 | stop_gained | 1 | |
| Glyma.20g240900 | stop_gained | 1 | |
| Cellular process | Glyma.03g081100 | stop_gained | 1 |
| Glyma.20g240900 | stop_gained | 1 | |
| Biological regulation | Glyma.03g081100 | stop_gained | 1 |
Cellular component organization or biogenesis Binding | Glyma.03g081100 | stop_gained | 1 |
| Glyma.03g034400 | stop_gained | 1 | |
| Glyma.03g173800 | stop_gained | 1 | |
| Glyma.15g166300 | stop_gained | 1 | |
| Glyma.18g100200 | stop_lost | 1 | |
| Glyma.18g161900 | stop_lost | 1 | |
| Catalytic activity | Glyma.01g009500 | stop_gained | 2 |
| Glyma.03g081100 | stop_gained | 1 | |
| Glyma.20g015100 | stop_gained | 1 | |
| Glyma.20g240900 | stop_gained | 1 | |
| Membrane | Glyma.20g240900 | stop_gained | 1 |
Figure 3Gene ontology annotation plot for 1241 genes containing InDels which were mutated in all varieties in HR. BP biological process, MF molecular function, CC cellular component. The x axis is the percentage of genes under a GO term to the total number of annotated genes. Note that the plot was generated using Microsoft Excel 2016.
Summary of the most relevant results from the GO enrichment analysis of the genes affected by non-synonymous InDels.
| Description | Genes | Sequence ontology | Non-synonymous InDels |
|---|---|---|---|
| Metabolic process | Glyma.01g013200 | frameshift_variant&start_lost | 1 |
| Glyma.04g030100 | frameshift_variant&stop_lost | 1 | |
| Glyma.04g086400 | conservative_inframe_insertion | 1 | |
| Glyma.15g187700 | frameshift_variant&stop_lost | 1 | |
| Glyma.15g234300 | frameshift_variant&stop_gained | 1 | |
| Cellular process | Glyma.15g187700 | frameshift_variant&stop_lost | 1 |
| Localization | Glyma.16g080500 | frameshift_variant&stop_lost | 1 |
| Catalytic activity | Glyma.01g013200 | frameshift_variant&start_lost | 1 |
| Glyma.04g030100 | frameshift_variant&start_lost | 1 | |
| Glyma.04g086400 | conservative_inframe_insertion | 1 | |
| Glyma.15g187700 | frameshift_variant&stop_lost | 1 | |
| Glyma.15g234300 | frameshift_variant&stop_gained | 1 | |
| Glyma.16g066200 | conservative_inframe_insertion | 1 | |
| Glyma.19g179300 | conservative_inframe_insertion | 1 | |
| Binding | Glyma.04g030100 | frameshift_variant&start_lost | 1 |
| Glyma.07g078000 | frameshift_variant&stop_lost | 1 | |
| Glyma.07g140200 | conservative_inframe_insertion | 1 | |
| Glyma.09g278400 | frameshift_variant&stop_gained | 1 | |
| Glyma.14g088700 | conservative_inframe_deletion | 1 | |
| Glyma.15g187700 | frameshift_variant&stop_lost | 1 | |
| Glyma.16g066200 | conservative_inframe_insertion | 1 | |
| Glyma.16g080500 | frameshift_variant&stop_lost | 1 | |
| Glyma.19g179300 | conservative_inframe_insertion | 1 |
Figure 4Population structure analysis. (A) Principal component analysis chart (PCA) of soybean cultivars in the HR. (B) Neighbor-joining (NJ) tree. (C) LD decay of SD, SE and the entire group. (D) Predictive log-likelihood as a function of the number of ancestral populations on the HR cultivated soybean population. Note that plot A and C were generated by self-written R scripts with R language version 4.02 (http://www.R-project.org), plot B was generated by ITOL (https://itol.embl.de/) and plot D was generated using Microsoft Excel 2016.
Nucleotide polymorphisms (π) and the polymorphic information content (PIC) of sub-populations at different breeding stages in the HR.
| Sub | Year of release | Accessions | Π (× 10–3) | PIC |
|---|---|---|---|---|
| A | 1963–1980 | 12 | 1.54 | 0.242 |
| B | 1983–1988 | 8 | 1.27 | 0.197 |
| C | 1991–2000 | 7 | 1.24 | 0.190 |
| D | 2001–2010 | 90 | 1.41 | 0.234 |
| E | 2011–2017 | 64 | 1.38 | 0.227 |
Sub. Sub-population names of different breeding stages in HR, Accessions Accession numbers of every sub-population, π Genetic diversity (× 10–3) of every sub-population, PIC polymorphic information content of every sub-population.
Distribution of SNPs of the core SNPs set on every chromosome.
| Chromosome | SNPs | Chromosome | SNPs |
|---|---|---|---|
| Chr01 | 230,431 | Chr12 | 125,812 |
| Chr02 | 211,263 | Chr13 | 237,867 |
| Chr03 | 273,640 | Chr14 | 209,270 |
| Chr04 | 277,013 | Chr15 | 368,802 |
| Chr05 | 138,715 | Chr16 | 260,243 |
| Chr06 | 273,489 | Chr17 | 220,101 |
| Chr07 | 191,174 | Chr18 | 409,735 |
| Chr08 | 207,390 | Chr19 | 230,387 |
| Chr09 | 250,745 | Chr20 | 185,048 |
| Chr10 | 228,476 | Total | 4,666,538 |
| Chr11 | 136,937 |