| Literature DB >> 28727806 |
Seongmun Jeong1, Jae-Yoon Kim1,2, Soon-Chun Jeong3, Sung-Taeg Kang4, Jung-Kyung Moon5, Namshin Kim1,2.
Abstract
Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 single nucleotide polymorphisms) have been identified from high-density single nucleotide polymorphism (SNP) arrays and next-generation sequencing (NGS) data. However, there is no software available for picking out the efficient and consistent core subset from such a huge dataset. It is necessary to develop software that can extract genetically important samples in a population with coherence. We here present a new program, GenoCore, which can find quickly and efficiently the core subset representing the entire population. We introduce simple measures of coverage and diversity scores, which reflect genotype errors and genetic variations, and can help to select a sample rapidly and accurately for crop genotype dataset. Comparison of our method to other core collection software using example datasets are performed to validate the performance according to genetic distance, diversity, coverage, required system resources, and the number of selected samples. GenoCore selects the smallest, most consistent, and most representative core collection from all samples, using less memory with more efficient scores, and shows greater genetic coverage compared to the other software tested. GenoCore was written in R language, and can be accessed online with an example dataset and test results at https://github.com/lovemun/Genocore.Entities:
Mesh:
Year: 2017 PMID: 28727806 PMCID: PMC5519076 DOI: 10.1371/journal.pone.0181420
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Datasets.
| Dataset | SNP Chip | # of | # of |
|---|---|---|---|
| Rice 1.5K | Illumina GoldenGate Assay | 1,536 | 395 |
| Wheat | Affymetrix Axiom 35K SNP array | 35,143 | 556 |
| Rice 700K | High Density Rice Assay 700K SNPs | 700,001 | 1,108 |
Fig 1Increase in coverage values versus number of selected samples for each software.
(A) Rice 1.5K dataset, (B) wheat dataset.
Core collection results (rice 1.5K dataset).
| Software | # of | MR | Min. MR | SH | CV |
|---|---|---|---|---|---|
| GenoCore | 0.63968 | 7.7815 | |||
| Core Hunter | 0.10717 | 98.252 | |||
| MSTRAT | 0.63339 | 0.10610 | 7.7361 | 98.677 | |
| PowerCore | 80 | 0.63337 | 0.30438 | 7.8010 | |
| Random | 0.61619 | 0.13179 | 7.7485 | 84.202 | |
| Raw data | 395 | 0.61614 | 0.04793 | 7.7572 |
Core collection results (wheat dataset).
| Software | # of | MR | Min MR | SH | CV |
|---|---|---|---|---|---|
| GenoCore | |||||
| Core Hunter | 0.60400 | 0.25343 | 11.1257 | 96.429 | |
| MSTRAT | 0.54568 | 0.21763 | 10.9978 | 89.692 | |
| Random | 0.54070 | 0.20502 | 10.9858 | 88.153 | |
| Raw data | 0.51715 | 0.10548 | 10.9053 |
Fig 2Principal component analysis.
(A) Rice 1.5K dataset, (B) wheat dataset.
Fig 3Venn diagram (rice 1.5K dataset).
Core collection results and system resources (rice 700K dataset).
| Software | Input file size | Used memory | # of | CV | Runtime |
|---|---|---|---|---|---|
| GenoCore | 1.6 Gb | 53 Gb | 62 | 99.000 | 7 h 15 min |
Fig 4Phenotype density (rice 700K dataset).
Fig 5Histogram of allele frequency (rice 1.5k dataset).
This is an allele frequency for reference allele of entire and core samples. A and B are the rice 1.5K and wheat dataset, respectively. They have similar distribution.
System resources (rice 1.5K dataset).
| Software | Input file size | Runtime | Used memory |
|---|---|---|---|
| GenoCore | 1.1 Mb | <1 min | 0.2 Gb |
| MSTRAT | 2.8 Mb | <1 min | 0.7 Gb |
| PowerCore | 1.1 Mb | 5 min, 40 sec | 0.08 Gb |
| Core Hunter | 2.8 Mb | <1min | 4.2 Gb |
System resources (wheat dataset).
| Software | Input file size | Runtime | Used maximum memory |
|---|---|---|---|
| GenoCore | 39 Mb | 10 min | 1.6 Gb |
| MSTRAT | 130 Mb | 10 min | 9.7 Gb |
| Core Hunter | 130 Mb | 26 min | 14 Gb |