| Literature DB >> 29855322 |
Herman De Beukelaer1, Guy F Davenport2, Veerle Fack3.
Abstract
BACKGROUND: Core collections provide genebank curators and plant breeders a way to reduce size of their collections and populations, while minimizing impact on genetic diversity and allele frequency. Many methods have been proposed to generate core collections, often using distance metrics to quantify the similarity of two accessions, based on genetic marker data or phenotypic traits. Core Hunter is a multi-purpose core subset selection tool that uses local search algorithms to generate subsets relying on one or more metrics, including several distance metrics and allelic richness.Entities:
Keywords: Core collections; Local search heuristics; Multi-objective
Mesh:
Year: 2018 PMID: 29855322 PMCID: PMC6092719 DOI: 10.1186/s12859-018-2209-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of random descent, parallel tempering, and a genetic algorithm, when maximizing the entry-to-nearest-entry criterion (E-NE). Mean values and standard deviations are reported for 10 independently sampled core collections
| Rice | Coconut | Maize | Pea | |
|---|---|---|---|---|
| Random descent | 0.1500 ± 1.83e-04 | 0.5748 ± 5.22e-04 | 0.4332 ± 2.73e-04 | 0.3337 ± 1.70e-03 |
| Parallel tempering | 0.1508 ± 1.40e-15 | 0.5759 ± 2.12e-06 | 0.4359 ± 8.56e-05 | 0.3412 ± 1.46e-04 |
| Genetic algorithm | 0.1506 ± 1.12e-04 | 0.5755 ± 1.04e-04 | 0.4346 ± 3.45e-04 | 0.3386 ± 8.00e-04 |
Fig. 1Convergence curves for pea dataset. These curves show the E-NE value of the best found solution at each point in time during execution of random descent, parallel tempering, and the genetic algorithm, averaged over 10 independent runs, for the large pea dataset. The left plot reports the progress during the entire run with a runtime of 30 min while the right plot is zoomed in on the first 40 s
Comparison of Core Hunter 2 and 3
| E-NE | DMIN | Time (s) | |
|---|---|---|---|
|
| |||
| CH2 | 0.552 ± 3.53e-2 | 0.501 ± 9.76e-2 | 27.6 ± 06.0 |
| CH3 | 0.540 ± 0.00e-0 | 37.5 ± 07.9 | |
| CH2L | 0.569 ± 5.91e-4 | 31.0 ± 00.1 | |
|
| |||
| CH2 | 0.416 ± 1.52e-2 | 0.396 ± 2.46e-2 | 78.3 ± 10.6 |
| CH3 | 0.409 ± 3.05e-3 | 74.3 ± 26.5 | |
| CH2L | 0.429 ± 5.00e-4 | 78.6 ± 02.0 | |
|
| |||
| CH2 | 0.219 ± 1.49e-3 | 0.000 ± 0.00e-0 | 85.6 ± 04.5 |
| CH3 | 0.287 ± 1.34e-2 | 154.1 ± 49.7 | |
| CH2L | 0.325 ± 8.21e-4 | 802.3 ± 00.8 | |
CH2 maximizes a weighted index including average and minimum pairwise distance, with equal weight, while CH3 maximizes E-NE. Mean E-NE, DMIN, runtime and corresponding standard deviations are reported for 10 independent executions. The highest obtained E-NE and DMIN value per dataset is shown in bold. CH3 was terminated when no improvements were found during 10 s. For CH2, two alternatives were considered: (a) the same stop condition as for CH3 (CH2); and (b) an absolute runtime limit that was empirically determined per dataset to ensure that the LR replica of MixRep terminated in each run (CH2L)
Fig. 2Simultaneous optimization of entry-to-nearest-entry (E-NE) and accession-to-nearest-entry (A-NE) distance. These Pareto front approximations for Core Hunter 3 were obtained by sampling cores with varying weights α1∈[ 0,1] and α2=1−α1 assigned to the E-NE and A-NE measures, respectively, with a step size of 0.05. The quality of the cores constructed by CH3 is compared with those obtained by GDOpt and SimEli, in terms of both objective functions. All reported values are averages of 10 independently sampled cores with the same settings
Fig. 3Simultaneous maximization of entry-to-nearest-entry distance (E-NE) and expected heterozygosity (HE). These Pareto front approximations for Core Hunter 3 were obtained by sampling cores with varying weights α1∈[ 0,1] and α2=1−α1 assigned to the E-NE and HE measures, respectively, with a step size of 0.05. The quality of the cores constructed by CH3 is compared with those obtained by GDOpt and SimEli, in terms of both objective functions. All reported values are averages of 10 independently sampled cores with the same settings. The rice dataset is excluded here because expected heterozygosity can only be evaluated for genotypic data
Average execution times (seconds) of GDOpt, both SimEli implementations and CH3 for 10 independent samples from each dataset. Three configurations are considered for CH3: (a) maximize E-NE; (b) minimize A-NE; and (c) maximize HE
| Rice | Coconut | Maize | Pea | |
|---|---|---|---|---|
| GDOpt | 14.9 | 7.1 | 91.2 | 350.1 |
| SimEli-A-RA | 7.6 | 7.5 | 11.5 | 514.7 |
| SimEli-HE | - | 15.9 | 78.0 | 502.3 |
| CH3 E-NE | 45.8 | 37.5 | 74.3 | 154.1 |
| CH3 A-NE | 74.6 | 55.7 | 133.1 | 86.7 |
| CH3 HE | - | 16.6 | 40.2 | 62.8 |