Literature DB >> 24336146

forqs: forward-in-time simulation of recombination, quantitative traits and selection.

Abstract

SUMMARY: forqs is a forward-in-time simulation of recombination, quantitative traits and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits.
AVAILABILITY AND IMPLEMENTATION: forqs is implemented as a command-line C++ program. Source code and binary executables for Linux, OSX and Windows are freely available under a permissive BSD license: https://bitbucket.org/dkessner/forqs.

Entities: Chemical Gene Species

Mesh：

Year: 2013 PMID： 24336146 PMCID： PMC3928523 DOI： 10.1093/bioinformatics/btt712

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Simulations have a long history in population genetics, both for verifying analytical results and for exploring population models that are mathematically intractable. Population genetics simulations can be broadly classified as forward-in-time (e.g. Wright–Fisher) or backward-in-time (e.g. coalescent). Coalescent simulations [e.g. ms (Hudson, 2002), MaCS (Chen ), fastsimcoal (Excoffier and Foll, 2011)] are efficient for simulating neutral sequence data because they only need to track lineages that are ancestral to the sample. Although it is possible to simulate certain selection scenarios within the coalescent framework (Ewing and Hermisson, 2010; Hudson and Kaplan, 1988), one must turn to forward-in-time simulations to model selection in a flexible way. Many forward-in-time simulators are currently available. Most of these simulators use a mutation-centric approach, implemented by storing the mutations carried by individuals in an array. To handle selection, the majority of these simulators assign selection coefficients to individual mutations [e.g. ForwSim (Padhukasahasram ), Fregene (Chadeau–Hyam ), GENOMEPOP (Carvajal–Rodriguez, 2008), SFS_CODE (Hernandez, 2008), TreesimJ (O’Fallon, 2010), SLiM (Messer, 2013)], although a few also include support for quantitative traits [e.g. ForSim (Lambert ), quantiNemo (Neuenschwander ), simuPOP (Peng and Kimmel, 2005)]. Hoban and Yuan are recent reviews providing a comprehensive comparison of these and other simulators. In many scenarios of biological interest, substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on standing variation, rather than mutational input. For example, one may be interested in the genome-wide haplotype patterns that emerge from admixture between historically isolated populations (Wegmann ) or from artificial selection on a quantitative trait. Studying these haplotype patterns can be difficult with existing forward-in-time simulators because detailed information about the mosaic haplotype structure of individuals is not readily available, and must be inferred from the output sequences of the simulation and/or stored recombination event data. In addition, forward-in-time simulators that store entire sequences incur a severe trade-off between the size of the genomic regions and the size of the populations simulated. Motivated by such examples, we have implemented a new forward-in-time simulation approach that, instead of tracking single-site variants, tracks individual haplotype chunks as they recombine over multiple generations. Further, we have designed the simulator for fast simulation of quantitative traits under selection. We have labeled this software forqs (Forward-in-time simulation of Recombination, Quantitative Traits and Selection). Similar approaches have been implemented recently by Haiminen and by Aberer and Stamatakis (2013) for the simple selection models with per-mutation fitness effects. The haplotype-based design allows for fast simulation of whole genomes, with efficient memory usage. For example, forqs can easily simulate two populations (size 10 000 each) selected for different optimal trait values, where individuals have human-sized genomes (23 chromosome pairs, 100 Mb each), taking ∼2 s/generation. For comparison, existing forward simulators are limited by the amount of sequence that can be stored in arrays in memory: for the aforementioned 20 000 individuals, 16 GB of memory would permit the storage of only 3.2 million base pairs of sequence per individual, which is an order of magnitude smaller than the smallest human chromosome. The forqs’ design also preserves information about the haplotype structure of individuals, which allows for immediate identification of genomic regions where individuals share identical-by-descent haplotype tracts. Our simulator uses a modular architecture to allow the user to flexibly specify recombination maps, mutation rates, demographic models, quantitative traits and fitness functions. This modular approach facilitates simulation of complicated scenarios and investigation of the resulting haplotype patterns. forqs is currently under active development to support ongoing projects.

2 DESIGN AND IMPLEMENTATION

forqs begins with a set of founding haplotypes carried by the individuals in the initial generation. Individuals are diploid and carry a user-specified number of chromosome pairs. By assigning a unique identifier to each founding haplotype, individual haplotype chunks are tracked as they recombine over subsequent generations (Fig. 1). For the purposes of simulation, any existing neutral variation on the haplotype chunks can be ignored, and only those loci with fitness effects need to be tracked.

Fig. 1.

forqs chromosome representation. An individual chromosome is represented by a list of haplotype chunks. Each haplotype chunk is represented by two numbers (position, id): the position where it begins and the identifier of the founding haplotype from which it is derived. This cartoon depicts a chromosome with three haplotype chunks as the result of recombination (double crossover) between two founder chromosomes forqs performs the following actions during a single cycle of the simulation: (i) generation of new populations, (ii) genotyping, (iii) quantitative trait evaluation, (iv) fitness evaluation and (v) reporting. forqs has a flexible design in which the simulator delegates specific tasks or calculations to configurable modules. The user specifies which modules to instantiate in a configuration file. In addition to the primary modules that are used to specify demography, mutation, recombination, quantitative traits, fitness and reported output, there are several building block modules that provide basic functionality to the primary modules. For example, Trajectory modules provide a unified method for specifying values that change over time, such as population sizes or migration rates. Similarly, Distribution modules can be used to specify how to draw particular random values [e.g. quantitative trait loci (QTL) positions or allele frequencies). As an illustration of forqs configuration, suppose that a user wanted to simulate populations undergoing neutral admixture. The user would specify a PopulationConfigGenerator module representing a stepping stone or island model with the desired population size and migration rate trajectories. However, the user would not specify any quantitative trait modules and would use the default FitnessFunction module that assigns identical fitness values to all individuals. On the other hand, to simulate an artificial selection experiment with truncation selection on a single quantitative trait, the user would specify the trait with QTLs and effect sizes, and choose a FitnessFunction module that selects the desired proportion of individuals to produce the next generation. Alternatively, the user could indicate that the QTLs and effect sizes should be drawn randomly from user-specified distributions. The representation of chromosomes as haplotype chunks in forqs makes efficient use of memory, independent of the size of the chromosomes. On a typical laptop computer, for a population size of 1 million, simulations take ∼1.5 s/generation for neutral simulations and ∼3 s/generation with selection at a single locus. Decreasing the population size allows the simulation of a greater number of generations in a reasonable amount of time: a population size of 10 000 takes ∼3 s/100 generations (without selection, with a slight increase with selection). However, forqs’ design comes with the trade-off that memory usage grows linearly with the number of generations simulated due to recombination. Thus, for investigations focusing on mutational input over a large number of generations (e.g. studies involving demographic changes taking place over thousands of generations), forqs’ design is not as efficient as array-based implementations (e.g. SLiM or SFS_CODE) that were designed specifically for these scenarios. Similarly, we recommend that forqs be used in conjunction with a coalescent simulator to generate neutral variation, rather than running forqs for a long burn-in period to reach mutation-drift equilibrium. forqs has been extensively tested for correctness, both at the level of individual code units and in its large-scale behavior in comparison with theoretical predictions from population genetics and quantitative genetics. Validation results, tutorials and documentation can be found in the Supplementary Information. Configuration files for all simulations mentioned in this article are included in the forqs software packages.

18 in total

1. Generating samples under a Wright-Fisher neutral model of genetic variation.

Authors: Richard R Hudson
Journal: Bioinformatics Date: 2002-02 Impact factor: 6.937

2. simuPOP: a forward-time population genetics simulation environment.

Authors: Bo Peng; Marek Kimmel
Journal: Bioinformatics Date: 2005-07-14 Impact factor: 6.937

3. Fast and flexible simulation of DNA sequence data.

Authors: Gary K Chen; Paul Marjoram; Jeffrey D Wall
Journal: Genome Res Date: 2008-11-24 Impact factor: 9.043

4. SLiM: simulating evolution with selection and linkage.

Authors: Philipp W Messer
Journal: Genetics Date: 2013-05-24 Impact factor: 4.562

5. A flexible forward simulator for populations subject to selection and demography.

Authors: Ryan D Hernandez
Journal: Bioinformatics Date: 2008-10-07 Impact factor: 6.937

6. ForSim: a tool for exploring the genetic architecture of complex traits with controlled truth.

Authors: Brian W Lambert; Joseph D Terwilliger; Kenneth M Weiss
Journal: Bioinformatics Date: 2008-06-19 Impact factor: 6.937

7. Exploring population genetic models with recombination using efficient forward-time simulations.

Authors: Badri Padhukasahasram; Paul Marjoram; Jeffrey D Wall; Carlos D Bustamante; Magnus Nordborg
Journal: Genetics Date: 2008-04 Impact factor: 4.562

8. quantiNemo: an individual-based program to simulate quantitative traits with explicit genetic architecture in a dynamic metapopulation.

Authors: Samuel Neuenschwander; Frédéric Hospital; Frédéric Guillaume; Jérôme Goudet
Journal: Bioinformatics Date: 2008-05-01 Impact factor: 6.937

9. Fregene: simulation of realistic sequence-level data in populations and ascertained samples.

Authors: Marc Chadeau-Hyam; Clive J Hoggart; Paul F O'Reilly; John C Whittaker; Maria De Iorio; David J Balding
Journal: BMC Bioinformatics Date: 2008-09-08 Impact factor: 3.169

10. GENOMEPOP: a program to simulate genomes in populations.

Authors: Antonio Carvajal-Rodríguez
Journal: BMC Bioinformatics Date: 2008-04-30 Impact factor: 3.169

11 in total

1. Evaluating Sequence-Based Genomic Prediction with an Efficient New Simulator.

Authors: Miguel Pérez-Enciso; Natalia Forneris; Gustavo de Los Campos; Andrés Legarra
Journal: Genetics Date: 2016-12-02 Impact factor: 4.562

2. A C++ template library for efficient forward-time population genetic simulation of large populations.

Authors: Kevin R Thornton
Journal: Genetics Date: 2014-06-20 Impact factor: 4.562

Review 3. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation.

Authors: C Schlötterer; R Kofler; E Versace; R Tobler; S U Franssen
Journal: Heredity (Edinb) Date: 2014-10-01 Impact factor: 3.821

4. XSim: Simulation of Descendants from Ancestors with Sequence Data.

Authors: Hao Cheng; Dorian Garrick; Rohan Fernando
Journal: G3 (Bethesda) Date: 2015-05-07 Impact factor: 3.154

5. Clotho: addressing the scalability of forward time population genetic simulation.

Authors: Patrick P Putnam; Philip A Wilsey; Ge Zhang
Journal: BMC Bioinformatics Date: 2015-06-10 Impact factor: 3.169

6. Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits.

Authors: Darren Kessner; John Novembre
Journal: Genetics Date: 2015-02-10 Impact factor: 4.562

7. Nucleotide diversity inflation as a genome-wide response to experimental lifespan extension in Drosophila melanogaster.

Authors: Pawel Michalak; Lin Kang; Pernille M Sarup; Mads F Schou; Volker Loeschcke
Journal: BMC Genomics Date: 2017-01-14 Impact factor: 3.969