| Literature DB >> 23372807 |
Qijian Song1, David L Hyten, Gaofeng Jia, Charles V Quigley, Edward W Fickus, Randall L Nelson, Perry B Cregan.
Abstract
The objective of this research was to identify single nucleotide polymorphisms (SNPs) and to develop an Illumina Infinium BeadChip that contained over 50,000 SNPs from soybean (Glycine max L. Merr.). A total of 498,921,777 reads 35-45 bp in length were obtained from DNA sequence analysis of reduced representation libraries from several soybean accessions which included six cultivated and two wild soybean (G. soja Sieb. et Zucc.) genotypes. These reads were mapped to the soybean whole genome sequence and 209,903 SNPs were identified. After applying several filters, a total of 146,161 of the 209,903 SNPs were determined to be ideal candidates for Illumina Infinium II BeadChip design. To equalize the distance between selected SNPs, increase assay success rate, and minimize the number of SNPs with low minor allele frequency, an iteration algorithm based on a selection index was developed and used to select 60,800 SNPs for Infinium BeadChip design. Of the 60,800 SNPs, 50,701 were targeted to euchromatic regions and 10,000 to heterochromatic regions of the 20 soybean chromosomes. In addition, 99 SNPs were targeted to unanchored sequence scaffolds. Of the 60,800 SNPs, a total of 52,041 passed Illumina's manufacturing phase to produce the SoySNP50K iSelect BeadChip. Validation of the SoySNP50K chip with 96 landrace genotypes, 96 elite cultivars and 96 wild soybean accessions showed that 47,337 SNPs were polymorphic and generated successful SNP allele calls. In addition, 40,841 of the 47,337 SNPs (86%) had minor allele frequencies ≥ 10% among the landraces, elite cultivars and the wild soybean accessions. A total of 620 and 42 candidate regions which may be associated with domestication and recent selection were identified, respectively. The SoySNP50K iSelect SNP beadchip will be a powerful tool for characterizing soybean genetic diversity and linkage disequilibrium, and for constructing high resolution linkage maps to improve the soybean whole genome sequence assembly.Entities:
Mesh:
Year: 2013 PMID: 23372807 PMCID: PMC3555945 DOI: 10.1371/journal.pone.0054985
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Illumina GAII DNA sequence analysis of eight genotypes and mixed DNA.
| Genotypes | Number of lanes | Number of reads | Total number of bases |
| PI 468916 and PI 479752 | 23 | 120,581,402 | 4,258,255,310 |
| Essex | 18 | 166,829,261 | 6,473,293,963 |
| Evans | 4 | 11,638,328 | 360,108,084 |
| Archer | 3 | 7,759,202 | 277,664,148 |
| Minsoy | 3 | 26,101,751 | 881,054,603 |
| Noir 1 | 3 | 39,234,960 | 1,404,545,136 |
| Peking | 2 | 35,012,819 | 1,237,429,040 |
| Mixed DNA | 15 | 91,764,054 | 3,421,615,732 |
| Total | 71 | 498,921,777 | 18,313,966,016 |
Validation rate of SNPs based on the Sanger sequence analysis of a random set of 767 SNP loci.
| Priority class | Total numberof loci | Number of loci with null or multiple amplicons | Number of lociwith good sequence | Number of SNPs validated | Number of SNPsnot validated | Validationrate (%) |
| A | 89 | 17 | 72 | 72 | 0 | 100 |
| B | 233 | 43 | 190 | 158 | 32 | 83 |
| C | 390 | 73 | 317 | 244 | 73 | 77 |
| D | 55 | 9 | 46 | 31 | 15 | 67 |
Number and density of selected SNPs and the 146,161 SNPs in euchromatic and heterochromatic regions of each soybean chromosome.
| Chromosome | Numberof SNPs | Number of selected SNPs in the euchromatic regions | Number ofselected SNPs in the heterochro-matic region | Sequence length of euchromatic regions(bp) | Sequencelength ofheterochroma-ticregions (bp) | SNP density in euchromaticregions (SNPs/Mb) | SNP density in the heterochro-matic regions (SNPs/Mb) | Number of SNPs of the 146,161 in the euchro-matic regions | Number of SNPs of the 146,161 in the heterochro-matic regions |
| Gm1 | 2489 | 1652 | 837 | 14841727 | 41073868 | 111.3 | 20.4 | 2688 | 5367 |
| Gm2 | 3445 | 2929 | 516 | 26316426 | 25340287 | 111.3 | 20.4 | 4099 | 2852 |
| Gm3 | 2691 | 2102 | 589 | 18879713 | 28901363 | 111.3 | 20.4 | 3245 | 4276 |
| Gm4 | 2718 | 2099 | 619 | 18855914 | 30387938 | 111.3 | 20.4 | 3197 | 5888 |
| Gm5 | 2927 | 2537 | 390 | 22797076 | 19139428 | 111.3 | 20.4 | 3062 | 1286 |
| Gm6 | 3041 | 2458 | 583 | 22083366 | 28639455 | 111.3 | 20.4 | 2908 | 4851 |
| Gm7 | 3421 | 3073 | 348 | 27609531 | 17073626 | 111.3 | 20.4 | 4586 | 1424 |
| Gm8 | 3795 | 3473 | 322 | 31208512 | 15787020 | 111.3 | 20.4 | 4010 | 2062 |
| Gm9 | 2556 | 1960 | 596 | 17602854 | 29240896 | 111.3 | 20.4 | 2828 | 5739 |
| Gm10 | 3241 | 2696 | 545 | 24219274 | 26750361 | 111.3 | 20.4 | 3435 | 4328 |
| Gm11 | 2610 | 2308 | 302 | 24367505 | 14805285 | 94.7 | 20.4 | 2441 | 1081 |
| Gm12 | 2378 | 1910 | 468 | 17140105 | 22973035 | 111.3 | 20.4 | 2516 | 2340 |
| Gm13 | 3591 | 3288 | 303 | 29558651 | 14850320 | 111.3 | 20.4 | 4286 | 2517 |
| Gm14 | 2863 | 2265 | 598 | 20344958 | 29366246 | 111.3 | 20.4 | 3304 | 3014 |
| Gm15 | 3164 | 2603 | 561 | 23378504 | 27560656 | 111.3 | 20.4 | 4398 | 5224 |
| Gm16 | 2370 | 1969 | 401 | 17708632 | 19688753 | 111.3 | 20.4 | 3042 | 3155 |
| Gm17 | 2694 | 2253 | 441 | 20240737 | 21666037 | 111.3 | 20.4 | 2957 | 4228 |
| Gm18 | 4618 | 4099 | 519 | 36632197 | 25675943 | 111.3 | 20.4 | 8814 | 4680 |
| Gm19 | 3520 | 3047 | 473 | 27373488 | 23215953 | 111.3 | 20.4 | 4371 | 4414 |
| Gm20 | 2569 | 1980 | 589 | 17784173 | 28988994 | 111.3 | 20.4 | 2897 | 3546 |
| Total | 60701 | 50701 | 10000 | 4.59E+08 | 4.91E+08 | 73084 | 72272 |
Figure 1Distribution of distances between adjacent pre-selected and selected SNPs in euchromatic and heterochromatic regions.
(A) Selected SNPs (50,701) in euchromatic regions of SoySNP50K and the pre-selected SNPs (73,084) in euchromatic regions. (B) Selected SNPs (10,000) in heterochromatic regions and pre-selected SNPs (72,272) in heterochromatic regions.
Association of SNP selection index score, Illumina Infinium design score and priority group with the rate of SNPs with successful allele calls.
| SNP selectionindex score (Ii) | Number of SNPs with success-ful allele calls | Number of SNPsnot called | Rate of SNPs withsuccess-fulallele calls | Illumina Infinium design score (Di) | Number of SNPs with success-ful allele calls | Number of SNPsnot called | Rate of SNPswith success-fulallele calls | Priority group | Number of SNPswith success-fulallele calls | Number of SNPsnot called | Rate of SNPs with success-ful allele calls |
| ≥0.9 | 5,974 | 210 | 0.97 | ≥0.9 | 22,359 | 1,326 | 0.94 | A | 10,943 | 582 | 0.95 |
| ≥0.8 & <0.9 | 7,630 | 410 | 0.95 | ≥0.8 & <0.9 | 10,316 | 800 | 0.93 | B | 16,163 | 1,536 | 0.91 |
| ≥0.7 & <0.8 | 10,995 | 749 | 0.94 | ≥0.7 & <0.8 | 5,755 | 605 | 0.90 | C | 16,505 | 2,003 | 0.89 |
| ≥0.6 & <0.7 | 11,102 | 1,000 | 0.92 | ≥0.6 & <0.7 | 3,649 | 460 | 0.89 | D | 3,835 | 474 | 0.89 |
| ≥0.5 & <0.6 | 5,507 | 663 | 0.89 | ≥0.5 & <0.6 | 2,414 | 458 | 0.84 | ||||
| ≥0.4 & <0.5 | 3,382 | 636 | 0.84 | ≥0.4 & <0.5 | 2,953 | 946 | 0.76 | ||||
| <0.4 | 2,856 | 927 | 0.75 | ||||||||
| Total | 47,446 | 4,595 | 47,446 | 4,595 | 47,446 | 4,595 |
Distribution of SNP minor allele frequency in elite cultivar, landrace and wild soybean populations.
| Minor allele frequency | Elite population | Landrace population | Wild population | All 288 genotypes |
| <0.05 | 13,114(27.9%) | 8,060(17.1%) | 8,833(19.3%) | 3,294(7.0%) |
| ≥0.05 & <0.1 | 3,358 (7.1%) | 4,432(9.4%) | 5,186(11.3%) | 3,202(6.8%) |
| ≥0.1 & <0.2 | 6,377 (13.6%) | 8,813(18.8%) | 9,300(20.3%) | 7,837(16.6%) |
| ≥0.2 & <0.3 | 7,665(16.3%) | 8,966(19.1%) | 7,913(17.3%) | 10,016(21.2%) |
| ≥0.3 & <0.4 | 8,393(17.9%) | 8,511(18.1%) | 7,094(15.5%) | 10,890(23.0%) |
| ≥0.4 & ≤0.5 | 8,085(17.2%) | 8,210(17.5%) | 7,464(16.3%) | 11,999(25.3%) |
| Total | 46992 | 46992 | 45,790 | 47,337 |
Distribution of genetic distance based upon the proportion of polymorphic SNPs between pairs of elite cultivar, landrace and wild soybean genotypes.
| Genetic distance | Number of pairs among elite genotypes | Number of pairs among landrace genotypes | Number of pairs among wild genotypes |
| <0.1 | 1 (0.0%) | 4 (0.1%) | 51 (1.1%) |
| ≥0.1 & <0.2 | 68 (1.5%) | 71 (1.6%) | 202(4.4%) |
| ≥0.2 & <0.3 | 2,319 (50.9%) | 1,377 (30.2%) | 2,120 (46.5%) |
| ≥0.3 & <0.4 | 2,172 (47.6%) | 2,861 (62.7%) | 2,167 (47.5%) |
| ≥0.4 & ≤0.5 | 0 (0.0%) | 247 (5.4%) | 20 (0.4%) |
| Total | 4,560 | 4,560 | 4,560 |
Number of 100 kb regions across the 20 soybean chromosomes with Fst ≥0.6 in comparisons of G. soja vs. landraces and landraces vs. elite cultivars.
| Chromosome | Number of regions with Fst ≥0.6 in | Number of regions with Fst ≥0.6 landrace vs. elite and with 100 kb interval to the adjacent regions |
| Gm01 | 48 | 3 |
| Gm02 | 21 | |
| Gm03 | 26 | |
| Gm04 | 11 | 11 |
| Gm05 | 72 | |
| Gm06 | 17 | |
| Gm07 | 41 | 1 |
| Gm08 | 26 | 3 |
| Gm09 | 15 | |
| Gm10 | 35 | |
| Gm11 | 32 | |
| Gm12 | 57 | 20 |
| Gm13 | 28 | |
| Gm14 | 51 | |
| Gm15 | 2 | |
| Gm16 | 8 | 1 |
| Gm17 | 11 | 2 |
| Gm18 | 14 | |
| Gm19 | 44 | |
| Gm20 | 61 | 1 |
| Total | 620 | 42 |
Figure 2Fst values of G. soja vs. landraces and elite cultivars vs. landraces along chromosome Gm14 and Gm04.
(A) Significant Fst values on Gm14 between SSR markers Sat_355 and Satt474 may contain loci associated with soybean domestication (B) Significant Fst values between SSR markers Satt396 and Sat_140 may contain loci that were under selection in N. American soybean breeding programs.