| Literature DB >> 32794321 |
Davoud Torkamaneh1,2,3, Jérôme Laroche2, Babu Valliyodan4, Louise O'Donoughue5, Elroy Cober6, Istvan Rajcan3, Ricardo Vilela Abdelnoor7,8, Avinash Sreedasyam9, Jeremy Schmutz9,10, Henry T Nguyen4, François Belzile1,2.
Abstract
Here, we describe a worldwide haplotype map for soybean (GmHapMap) constructed using whole-genome sequence data for 1007 Glycine max accessions and yielding 14.9 million variants as well as 4.3 M tag single-nucleotide polymorphisms (SNPs). When sampling random subsets of these accessions, the number of variants and tag SNPs plateaued beyond approximately 800 and 600 accessions, respectively. This suggests extensive coverage of diversity within the cultivated soybean. GmHapMap variants were imputed onto 21 618 previously genotyped accessions with up to 96% success for common alleles. A local association analysis was performed with the imputed data using markers located in a 1-Mb region known to contribute to seed oil content and enabled us to identify a candidate causal SNP residing in the NPC1 gene. We determined gene-centric haplotypes (407 867 GCHs) for the 55 589 genes and showed that such haplotypes can help to identify alleles that differ in the resulting phenotype. Finally, we predicted 18 031 putative loss-of-function (LOF) mutations in 10 662 genes and illustrated how such a resource can be used to explore gene function. The GmHapMap provides a unique worldwide resource for applied soybean genomics and breeding.Entities:
Keywords: genetic variants; haplotype; haplotype map; imputation; loss-of-function mutation; soybean; whole-genome sequencing
Year: 2020 PMID: 32794321 PMCID: PMC7868971 DOI: 10.1111/pbi.13466
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 9.803
Figure 1Description of GmHapMap. (a) Geographical distribution of GmHapMap accessions. (b) Venn diagram representing the degree of overlap among variants called using the two collections of sequenced soybean accessions. (c) Population structure analysis using all SNPs representing six different subpopulations (K = 6) in the GmHapMap collection. (d) Distribution of genetic diversity among subpopulations of GmHapMap. [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 2(a) Average number of variants (pink) and tag SNPs (blue) detected in random subsets of N accessions (where n = 100, 200 etc.). This average was derived from subsampling 20 times. (b) Imputation accuracy as a function of allele frequency for 6 different scenarios; three different experimentally derived genotype datasets (SoySNP50K, GBS and GBS/SoySNP50K) and two reference panels (REF‐I and REF‐II). [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 3Description of GCHs characterized in the GmHapMap dataset. (a) Distribution of the number of genes that have a given number of predicted GCHs. (b) Distribution of the number of SNPs residing in a 10‐kb window in and around genes in soybean according to the number of gene‐centric haplotypes (GCHs) defined using HaplotypeMiner. (c) Distribution of the mean length of genes and gene‐centric haplotypes (GCHs) according to the number of GCHs defined by HaplotypeMiner. Haplotype length is defined as the distance between the two retained SNP markers that reside to one side and the other (relative to the middle of the gene) and are the furthest apart from one another. (d) Schematic representation of predicted GCHs for GmGIa. [Colour figure can be viewed at wileyonlinelibrary.com]
Number of loss‐of‐function variants by sequence ontology (SO)
| SO term | SNP | MNP | INS | DEL | Total variants | Genes |
|---|---|---|---|---|---|---|
| Splice site‐disrupting (donor) | 1270 | 38 | 247 | 205 | 1760 | 1640 |
| Splice site‐disrupting (acceptor) | 1546 | 52 | 207 | 146 | 1951 | 1803 |
| Stop codon‐introducing | 2826 | 149 | 100 | 7 | 3082 | 2418 |
| Frameshift‐inducing | 0 | 0 | 4158 | 6596 | 10 754 | 6718 |
| Start/Stop codon‐disrupting | 345 | 40 | 54 | 45 | 484 | 452 |
| Total | 5987 | 279 | 4766 | 6999 | 18 031 | 13 031 |
| Total number of genes affected by LOF variants | 10 662 | |||||
Some of the genes were affected with more than one LOF mutation; therefore, the total number of genes is lower than the sum of the all genes.
Figure 4Phenotypic variation observed between accessions with (blue) and without (red) a predicted LOF mutation in four different genes. (a) FAD3A, a key gene for linolenic acid synthesis; (b) GmJ, a key gene of Long Juvenile trait; (c) GmGIa, a key gene controlling maturity; (d), KASIIa, a key gene in the oil biosynthesis pathway. In each case, the number of accessions sharing the same allele (and for which phenotypic data were at hand) is indicated. [Colour figure can be viewed at wileyonlinelibrary.com]