| Literature DB >> 31416462 |
Christos Vlachos1,2, Claire Burny1,2, Marta Pelizzola1,2, Rui Borges1, Andreas Futschik3,4, Robert Kofler5, Christian Schlötterer6.
Abstract
BACKGROUND: The combination of experimental evolution with whole-genome resequencing of pooled individuals, also called evolve and resequence (E&R) is a powerful approach to study the selection processes and to infer the architecture of adaptive variation. Given the large potential of this method, a range of software tools were developed to identify selected SNPs and to measure their selection coefficients.Entities:
Mesh:
Year: 2019 PMID: 31416462 PMCID: PMC6694636 DOI: 10.1186/s13059-019-1770-8
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Overview of the evaluated tools
| Tool |
| RAM | ts. | rep. | m/w | Description | Input | Output | lang. | Reference |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 6 s | 221 M | No | No | +/+ | Pearson | freq, cov, Ne | p | R | [ |
| E&R- | 8 s | 306 M | Yes | No | +/+ | freq, cov, Ne | p | R | [ | |
| CLEAR | 3000 s | 1100 M | Yes | Yes | +/+ | Discrete HMM of allele trajectories under a WF model | sync,Ne | s, Ne, h, LL | Python | [ |
| cmh | 216 s | 145 M | No | Yes | +/+ | Test for homogeneity (similar to | sync | p | Perl/R | [ |
| E&R-cmh | 8 s | 560 M | Yes | Yes | +/+ | CMH test adapted to account for drift | freq, cov, Ne | p | R | [ |
| LLS | 1091 s (83 h) | 340 M | Yes | Yes | +/+ | Linear model with least square regression of logit-transformed allele frequencies | freq, cov, Ne | p, s, h | R | [ |
| LRT-1 | 31 s | 127 M | No | Yes | −/− | LRT of parallel selection | freq, cov, Ne | LRT, | Python | [ |
| LRT-2 | 31 s | 127 M | No | Yes | −/− | LRT of heterogeneous selection | freq, cov, Ne | LRT, | Python | [ |
| GLM | 220 s | 300 M | Yes | Yes | +/+ | Quasibinomial GLM with replicates and time as predictors | freq | p | R | [ |
| LM | 157 s | 300 M | Yes | Yes | +/+ | LM with replicates and time as predictors | freq | p | R | [ |
| BBGP | 37 h | 15 M | Yes | Yes | +/+ | A Bayesian model of allele trajectories following a Gaussian process | sync | BF | R | [ |
| FIT1 | 16 s | 220 M | Yes | No | −/− | A | freq | p | R | [ |
| FIT2 | 68 s | 220 M | No | Yes | −/− | A | freq | p | R | [ |
| WFABC | 42 h | 8 MB | Yes | No | +/+ | ABC of WF dynamics with selection | freq, Ne (h) | BF, s | C++ | [ |
| slattice | 41 h | 250 M | Yes | No | +/+ | HMM of allele trajectories under a WF model using an EM algorithm | freq, Ne (h) | s, LL | R | [ |
For each tool, we show the time required to analyze a small data set (t, either in seconds (s) or hours (h)), the memory requirements (RAM), if time series data may be used (ts.), if replicates are accepted (rep), if a manual and a walk-through is available (m/w), a short description, the required input, the generated output, the programming language (lang.), and the reference for LLS the time required to estimate the selection coefficient and the p-value (in brackets) is provided. sync file, freq allele frequency, cov coverage, Ne effective population size, h heterozygous effect, p value, s selection coefficient, LRT likelihood ratio test, BF Bayes factor, LL log-likelihood, shared allele frequency change, dx change in allele frequency in a single replicate r
Fig. 1Overview of the simulated scenarios. a Response to selection with either fitness (sweep, stabilizing selection) or the phenotypic value (truncating selection) being displayed for three time points. For truncating selection, the fraction of culled individuals is indicated in color. With stabilizing selection, once the trait optimum is reached, selection acts to reduce the fitness variance within a population. b Schematic representation of the trajectories of the targets of selection expected for the three different scenarios
Fig. 2Performance of the tools under three different scenarios. The performance of tools supporting replicates (left panels) and not supporting replicates (right panels) was analyzed separately. For fast tools, the entire data set was analyzed (solid line) whereas a subset of the data was used for slow tools (dashed lines); The performance of a random classifier is shown as the reference (black dotted line). a Selective sweeps. b Truncating selection. c Stabilizing selection
Fig. 3Accuracy of estimated selection coefficients in mean squared error (MSE). Results are shown for tests supporting (black) and not supporting (blue) multiple replicates
Fig. 4The tools perform similarly with data from different real E&R studies. We performed a PCA with the normalized test statistics for tools supporting (left panel) and not supporting replicates (right panel). Data are from E&R studies in D. simulans [7], C. elegans [33], and yeast [9]
Overview of the default parameters used for the simulations
| Parameter | Default value |
|---|---|
| Chromosome | 2L |
| Population size ( | 1000 |
| Number of causative loci | 30 |
| Number of generations | 60 |
| Replicates | 10 |
| Heritability | 1.0 |
| Recombination map | Comeron et al. [ |
| Repetitions | 100 (using different sets of selected SNPs) |