| Literature DB >> 36053200 |
Kira Villiers1, Eric Dinglasan1, Ben J Hayes1, Kai P Voss-Fels1,2.
Abstract
Simulation tools are key to designing and optimizing breeding programs that are multiyear, high-effort endeavors. Tools that operate on real genotypes and integrate easily with other analysis software can guide users toward crossing decisions that best balance genetic gains and genetic diversity required to maintain gains in the future. Here, we present genomicSimulation, a fast and flexible tool for the stochastic simulation of crossing and selection based on real genotypes. It is fully written in C for high execution speeds, has minimal dependencies, and is available as an R package for the integration with R's broad range of analysis and visualization tools. Comparisons of a simulated recreation of a breeding program to a real data set demonstrate the simulated offspring from the tool correctly show key population features, such as genomic relationships and approximate linkage disequilibrium patterns. Both versions of genomicSimulation are freely available on GitHub: The R package version at https://github.com/vllrs/genomicSimulation/ and the C library version at https://github.com/vllrs/genomicSimulationC/.Entities:
Keywords: C language; R package; breeding program design; breeding program simulation; genomic selection; meiosis simulation
Mesh:
Year: 2022 PMID: 36053200 PMCID: PMC9526041 DOI: 10.1093/g3journal/jkac216
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Fig. 1.Meiosis simulation procedure in genomicSimulation uses a count-location strategy. The following steps are performed for every homologous pair of chromosomes. (A sample procedure is marked on the diagrams.) a) First, the number of crossovers is drawn from a Poisson distribution with expectation matching the length of the chromosome in Morgans. b) Next, the positions of each of those crossovers are drawn uniformly across the length of the chromosome. c) A final random draw determines which of the 2 resultant gametes is chosen. d) The gamete is created by reading along the chosen starting chromosome, swapping to the other of the pair whenever a crossover point is encountered.
Fig. 2.a) Diagram of each cycle of a simple breeding program plan. The program was simulated using genomicSimulation (see Box 1 for the genomicSimulation implementation). b) Mean and (c) variance in the population’s genetic breeding value are shown for each replication (thin lines) and averaged across replications (thick lines). In the first condition, the best phenotypes across the entire population in the relevant generation are selected, while in the other condition, the best phenotypes in each family are selected.
Fig. 3.a) The crossing plan of a structured population developed by Alahmad . The matrices of Roger’s Genetic Distance between all (b) real imputed genotypes, (c) simulated genotypes, of the final-generation offspring resulting from that crossing design.
Fig. 4.The mean LD decay () between markers in the real founding genotypes of the NAM population, the real F6 genotypes, and the simulated F6 genotypes, as a histogram on distance between markers in centiMorgans (according to the converted physical map).
Average execution time (in seconds) across 6 repeats of tasks in genomicSimulation v0.2, running on a consumer-model laptop with 8 GB RAM and Intel i5-7200U CPU @ 2.50 GHz.
| Mean execution time (s) | Load 50 genotypes of 5,000 SNPs | Perform 105 random crosses | Get resulting breeding values | Get resulting genotypes |
|---|---|---|---|---|
| genomicSimulationC | 0.66 | 4.26 | 70.13 | 67.99 |
| genomicSimulation (in RStudio) | 0.97 | 2.05 | 74.60 | 108.18 |
| MoBPS (in RStudio) | 0.74 | 193.99 | 0.24 | No equivalent |
| AlphaSimR (in RStudio) | 0.25 | 15.47 | 3.12 | 102.29 |
| BreedingSchemeLanguage (in RStudio) | 1.19 | * | * | * |
The benchmarks are compared to the times taken to perform comparable tasks in simulation tools MoBPS (Pook ), AlphaSimR (Gaynor ), and Breeding Scheme Language (Yabe ). The tasks benchmarked are: (1) loading 50 genotypes of 5,000 SNPs, (2) perform 100,000 random crosses between those genotypes with one progeny per cross, (3) calculate then save the breeding values of the 105 genotypes from task 2 to an R dataframe (except for genomicSimulationC, which rather saves them to a file), and (4) save the 105 genotypes from task 2 to a file. Note that the R version of genomicSimulationalso shares the C library’s functionality for saving simulated data to files rather than to R dataframes. The time taken to save output to files is comparable across R and C versions. Cells marked with an asterisk (*) mark tasks that could not be benchmarked due to memory limitations on the testing machine.