| Literature DB >> 32680854 |
Abstract
The availability of whole genome sequencing data from multiple related populations creates opportunities to test sophisticated population genetic models of convergent adaptation. Recent work by Lee and Coop (2017) developed models to infer modes of convergent adaption at local genomic scales, providing a rich framework for assessing how selection has acted across multiple populations at the tested locus. Here I present, rdmc, an R package that builds on the existing software implementation of Lee and Coop (2017) that prioritizes ease of use, portability, and scalability. I demonstrate installation and comprehensive overview of the package's current utilities.Entities:
Keywords: Composite Likelihood; Convergent Adaptation; Population Genetics; R; Software
Mesh:
Year: 2020 PMID: 32680854 PMCID: PMC7467004 DOI: 10.1534/g3.120.401527
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Description of the arguments used with the function parameter_barge()
| Function argument | Description |
|---|---|
| neutral_freqs | Matrix of allele frequencies at putatively neutral sites with dimensions, number of populations x number of sites. |
| selected_freqs | Matrix of allele frequencies at putatively selected sites with dimensions, number of populations x number of sites. |
| selected_pops | Vector of indices for populations that experienced selection. |
| Positions | Vector of genomic positions for the selected region. |
| n_sites | Integer for the number of sites to propose as the selected site. Sites are uniformly placed along positions using seq(min(positions), max(positions), length.out = n_sites). Must be less than or equal to length(positions). Cannot be used with sel_sites. |
| sel_sites | Optional vector of sites to propose as selected site. Useful if particular sites are suspected to be under selection. Cannot be used with n_sites. |
| sample_sizes | Vector of sample sizes of length number of populations. ( |
| num_bins | The number of bins in which to bin alleles a given distance from the proposed selected sites. |
| Sets | A list of population indices, where each element in the list contains a vector of populations with a given mode of convergence. For example, if populations 2 and 6 share a mode and population 3 has another, sets = list(c(2,6), 3). Only required for fitting models with mixed modes. Must be used in conjunction with the ”modes” argument. |
| Modes | Character vector of length sets defining mode for each set of selected populations (”independent”, ”standing”, and/or ”migration”). Only required for fitting models with mixed modes. More details about the modes is available on help page for mode_cle |
| Sels | Vector of proposed selection coefficients. |
| Migs | Vector of proposed migration rates (proportion of individuals of migrant origin each generation). Cannot be 0. |
| Times | Vector of proposed times in generations the variant is standing in populations before selection occurs and prior to migration from source population. |
| Gs | Vector of proposed frequencies of the standing variant. |
| Sources | Vector of population indices to propose as the source population of the beneficial allele. Used for both the migration and standing variant with source models. Note: the source must be one of the populations contained in selected_pops. |
| Ne | Effective population size (assumed equal for all populations). |
| Rec | Per base recombination rate for the putatively selected region. |
| locus_name | String to name the locus. Helpful if multiple loci will be combined in subsequent analyses. Defaults to ”locus”. |
| Cholesky | Logical to use cholesky factorization of covariance matrix. Used for both inverse and determinant. Faster, but not guaranteed to work for all data sets. TRUE by default. if FALSE, ginv from MASS is used. |
Benchmarking of three rdmc model types. Computation time, memory allocation, and the number of garbage collections are reported for the original (dmc) code written by Lee and Coop (2017), and the two matrix inversion methods available in rdmc (ginv and chol.). Median time was estimated using 5 iterations of each model. Execution time is reported in seconds. Benchmarking was conducted using the R package, bench(Hester 2020). Code was executed in an interactive job on the UC Davis Farm HPC (2.00GHz Intel Xeon CPU, 124GB RAM)
| Model | version | median time | memory | garbage collections |
|---|---|---|---|---|
| ind. | Dmc | 15.1 | 230.6MB | 1 |
| ind. | chol. | 12.9 | 109.2MB | 3 |
| ind. | Ginv | 18.4 | 195.6MB | 1 |
| migration | dmc | 264.6 | 2.9GB | 19 |
| migration | chol. | 182.3 | 1.6GB | 55 |
| migration | ginv | 321.5 | 2.8GB | 18 |
| std.var | dmc | 780.2 | 8.6GB | 52 |
| std.var | chol. | 537.4 | 4.8GB | 136 |
| std.var | ginv | 898.5 | 8.6GB | 49 |
Figure 1Visualizing rdmc results for several modes. (A) The composite likelihood score at each of the 10 proposed sites of selection for each model. The true selected site was modeled at position 0. The data were simulated as independent mutations in the three selected populations. (B) The composite likelihood scores over grid of selection coefficients. Dotted line indicates true selection coefficient () the data were modeled under. Visualizations were made using the R packages, ggplot2(Wickham 2016) and cowplot(Wilke 2019).