| Literature DB >> 24271940 |
Won-Hyong Chung1, Namhee Jeong, Jiwoong Kim, Woo Kyu Lee, Yun-Gyeong Lee, Sang-Heon Lee, Woongchang Yoon, Jin-Hyun Kim, Ik-Young Choi, Hong-Kyu Choi, Jung-Kyung Moon, Namshin Kim, Soon-Chun Jeong.
Abstract
Despite the importance of soybean as a major crop, genome-wide variation and evolution of cultivated soybeans are largely unknown. Here, we catalogued genome variation in an annual soybean population by high-depth resequencing of 10 cultivated and 6 wild accessions and obtained 3.87 million high-quality single-nucleotide polymorphisms (SNPs) after excluding the sites with missing data in any accession. Nuclear genome phylogeny supported a single origin for the cultivated soybeans. We identified 10-fold longer linkage disequilibrium (LD) in the wild soybean relative to wild maize and rice. Despite the small population size, the long LD and large SNP data allowed us to identify 206 candidate domestication regions with significantly lower diversity in the cultivated, but not in the wild, soybeans. Some of the genes in these candidate regions were associated with soybean homologues of canonical domestication genes. However, several examples, which are likely specific to soybean or eudicot crop plants, were also observed. Consequently, the variation data identified in this study should be valuable for breeding and for identifying agronomically important genes in soybeans. However, the long LD of wild soybeans may hinder pinpointing causal gene(s) in the candidate regions.Entities:
Keywords: domestication; resequencing; soybean; variation
Mesh:
Year: 2013 PMID: 24271940 PMCID: PMC3989487 DOI: 10.1093/dnares/dst047
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1.Soybean population structure. (a) Changing morphology of domesticated soybean (left) and its wild relative (right). (b) Neighbour-joining phylogenetic tree of soybean nuclear genomes based on the high-quality SNPs, with the evolutionary distances measured by the p-distance. All branches except one denoted were supported by 100% bootstrap values from 1000 bootstrap replications. Taxa in the neighbour-joining tree (right) are represented by different colours: wild (red) and cultivated (blue) soybeans. Cultivated soybeans were tentatively grouped into C1, C2, and C3. (c) Bayesian clustering of samples using the STRUCTURE program. Each accession is represented by a vertical bar and each colour represents one population. An asterisk indicates a narrow pink segment, which is visible when enlarged. The mean value of ln-likelihood when K changed from 2 to 7 was −41525152, −37378506, −37280892, −32931343, −34405078, and −37518839, respectively.
Summary of SNPs and indels variations for cultivated and wild soybeans obtained from individual or multi-sample genotyping
| Group | Sample size | SNP | Indel | ||||
|---|---|---|---|---|---|---|---|
| Total | Genic | CDSa | Total | Genic | CDS | ||
| Individual genotyping | |||||||
| Cultivated | 10 | 4 182 059 | 618 493 | 139 107 | 799 470 | 131 032 | 7269 |
| Wild type | 6 | 7 626 486 | 2 276 394 | 252 245 | 1 447 750 | 246 653 | 13 003 |
| Total | 16 | 9 028 250 | 1 138 197 | 296 648 | 1 769 260 | 294 390 | 15 764 |
| Multi-sample genotyping | |||||||
| Cultivated | 10 | 1 687 232 | 352 771 | 78 172 | 225 609 | 57 543 | 3972 |
| Wild type | 6 | 3 290 830 | 675 710 | 147 121 | 430 564 | 108 560 | 6824 |
| Total | 16 | 3 871 469 | 788 809 | 173 293 | 499 865 | 125 289 | 8222 |
aVariations in coding sequences (CDS) of representative genes. As alternative transcripts were not included, the numbers presented should be regarded as approximate.
Figure 2.Genome-wide analysis of nucleotide diversity and selection. (a) LOWESS curves of LD decay patterns determined by squared correlations of allele frequencies (r2) against distance between polymorphic sites in cultivated (red) and wild (blue) soybeans. (b and c) Distributions of ROD values (b) and Z-transformed fixation index (FST) values (c) for cultivated relatives to wild soybeans in 100-kb windows across the genome. ROD = 0.98 corresponds with –log10(1 − ROD) = 1.70. The chromosome number is indicated along the x-axis.
Figure 3.Features of candidate domestication genes homologous to cloned canonical domestication genes. (a) ROD (red), and average fixation index, FST (blue), plotted for 100-kb windows across a 10-Mb region (upper panel) or for 10-kb windows across a 1-Mb region (lower panel) of chromosome 17, which harbours a cluster of candidate domestication genes homologous to cloned canonical domestication genes. Strong candidate domestication genes in the region are shown below (a). Grey boxes indicate a 180-kb chromosomal region having >0.98 ROD values in 100-kb windows and its corresponding region in 10-kb windows. For simplicity, –log10(1 − ROD) values of ≥4 are shown as corresponding with –log10(1 − ROD) = 4. (b) Neighbour-joining phylogenetic tree showing relationship among soybean TCP family proteins, which appeared in CDRs, and functionally characterized representative members of other species, which were described by Martín-Trillo and Cubas.[60] The percentage of bootstrap samples is shown to indicate the reliability for branching. Only the TCP domain was used for the analysis. See Supplementary Fig. S9 for the phylogenetic relationship among all predicted TCP proteins of soybean and representative members of other species. (c) Structure of three candidate domestication proteins showing conserved domains (coloured boxes) and positions of amino acid substitutions by nsSNPs fixed in wild soybeans. Glyma12g29991 is homologous to qSH1 and Glyma05g03660 and Glyma17g14191 are homologous to OsMADS56. POX is a functionally unknown domain named ‘associated with HOX’; homeobox is BEL1-type homeobox; MADS is SRF-type MADS box; K is K-box region.
Coding sequence diversity and amino acid differences in 10 cultivated and 6 wild resequenced soybean genomes at domestication candidate genes homologous to canonical domestication genes
| Domestication gene | Soybean candidate gene | Length (bp) | ROD | Genetic diversity in genic region | Genetic diversity in CDS | Amino acid differencea from reference | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genic | CDS | 100-kb | 100-kb | Cul | Wild | Cul | Wild | Cul | Wild | Cul | Wild | Wild (MAF) | Cul (MAF) | ||
| 5477 | 1635 | 0.982 | 0.452 | 0.2 | 2.42 | 0.37 | 0.26 | 0 | 0.37 | 0 | 0.27 | Q491H (0.5) | nd | ||
| 7859 | 2385 | 0.994 | 0.427 | 0.31 | 1.88 | 1.12 | 0.54 | 0 | 1.12 | 0 | 1.47 | N34D (0.17); P42S (0.17); H240L (0.17); C658T (0.17); N674H (1) | nd | ||
| 2332 | 507 | 0.995 | 0.667 | 0 | 8.75 | 1.05 | 0 | 0 | 1.05 | 0 | 0.86 | T648A (0.33) | nd | ||
| 1601 | 633 | 0.994 | 0.493 | 0.22 | 3.16 | 1.05 | 0.22 | 0.56 | 1.05 | 0.56 | 1.38 | V21_D22insSW (0.17); N76K (0.17) | H16D (0.2) | ||
| 2343 | 831 | 0.987 | 0.727 | 0 | 0.71 | 0 | 0 | 0 | 0 | 0 | 0 | nd | nd | ||
| 9994 | 615 | 0.997 | 0.470 | 0.19 | 2.87 | 0 | 0.25 | 0 | 0 | 0 | 0 | nd | nd | ||
| 7654 | 684 | 0.994 | 0.542 | 0.26 | 5.38 | 1.27 | 0.23 | 0.78 | 1.27 | 0.52 | 1.28 | I124V (0.17), T218A (1) | nd | ||
| 8420 | 663 | 0.991 | 0.546 | 0.002 | 4.18 | 1.41 | 0.004 | 0 | 1.41 | 0 | 1.32 | G177F (1); T180S (1); N182_V183insDAEL (1) | nd | ||
| 5779 | 1143 | 0.982 | 0.452 | 0.2 | 2.62 | 2.9 | 0.31 | 0 | 0.29 | 0 | 0.38 | nd | nd | ||
Cul, cultivated soybean; Wild, wild soybean; CDS, coding sequence; MAF, minor allele frequency; nd, not detected.
aThe format for an amino acid difference is X#Y, where X is the amino acid of the Williams 82 reference genome, # is the position of the substitution, and Y is the new amino acid; and X#_Y#insAB, where X and Y are the amino acids of the Williams 82 reference genome, #s are the positions of the insertion, and insAB indicates A and B amino acids were inserted.
Figure 4.Selection of three ROPGEF12 homeologs. (a) ROD (red), and average fixation index, FST (blue), plotted for 100-kb windows (upper panel) or 10-kb windows (lower panel) across 10-Mb or 1-Mb regions, respectively, of three duplicated chromosomal segments from palaeopolyploidization harbouring a ROPGEF12 homologue. Grey boxes indicate 140-kb (Gm07), 100-kb (Gm13), and 360-kb (Gm15) chromosomal regions having >0.98 ROD values in 100-kb windows and its corresponding region in 10-kb windows. For simplicity, –log10(1 − ROD) values of ≥4 are shown as corresponding with –log10(1 − ROD) = 4. (b) Homeologous (duplicated) relationship between genes on the three duplicated chromosomal segments. Predicted genes are indicated by coloured block arrows except black arrows for ROPGEF12 homologues. Grey boxes between genes show homeologs.