| Literature DB >> 35178874 |
Belén Jiménez-Mena1, Hugo Flávio1, Romina Henriques1, Alice Manuzzi1, Miguel Ramos2, Dorte Meldrup1, Janette Edson3, Snaebjörn Pálsson4, Guðbjörg Ásta Ólafsdóttir5, Jennifer R Ovenden6, Einar Eg Nielsen1.
Abstract
Targeted sequencing is an increasingly popular next-generation sequencing (NGS) approach for studying populations that involves focusing sequencing efforts on specific parts of the genome of a species of interest. Methodologies and tools for designing targeted baits are scarce but in high demand. Here, we present specific guidelines and considerations for designing capture sequencing experiments for population genetics for both neutral genomic regions and regions subject to selection. We describe the bait design process for three diverse fish species: Atlantic salmon, Atlantic cod and tiger shark, which was carried out in our research group, and provide an evaluation of the performance of our approach across both historical and modern samples. The workflow used for designing these three bait sets has been implemented in the R-package supeRbaits, which encompasses our considerations and guidelines for bait design for the benefit of researchers and practitioners. The supeRbaits R-package is user-friendly and versatile. It is written in C++ and implemented in R. supeRbaits and its manual are available from Github: https://github.com/BelenJM/supeRbaits.Entities:
Keywords: R-package; ancient DNA; baits; capture sequencing; genomics; population genetics
Mesh:
Substances:
Year: 2022 PMID: 35178874 PMCID: PMC9313901 DOI: 10.1111/1755-0998.13598
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 8.678
FIGURE 1(a) Illustration of the design of the bait set. Different types of areas are taken into account for the design: exclusion areas, where no baits will be placed upon; regions of interest, typically genes or other areas to explore in the research questions; and points of interest, typically SNPs. (b) Diagram showing the “on‐target” area. A read was considered “on‐target” if it was located within 350 bp up or downstream of the genomic position of the designed capture bait of 120 bp
FIGURE 2Examples of different options of tiling to design baits for a region of interest. (a) Tiling using a given offset distance between baits (e.g., 40 bp), (b) exact tiling (e.g., 3×)
Summary table of the main considerations on the design of baits for population genetics
| Type | Example | |
|---|---|---|
| Available genomic resources |
Genome Transcriptome De novo assemblies Other (close) species |
Atlantic salmon and Atlantic cod Tiger shark |
| Question |
Neutral vs. adaptive processes Population substructuring Estimates of effective population size Retrospective genomics Environmental DNA |
Coding/non‐coding regions Anonymous regions of the genome/transcriptome Neutral areas of the genome Coding/noncoding regions, anonymous regions |
| Type of targeted region |
Known SNPs Genes of interest/quantitative traits loci Inversions Neutral areas of the genome |
Baits in SNPs (e.g., from SNP‐chips) Randomly allocated baits in genes or regions of interest Baits in known inversions Randomly allocated baits |
| Bait length |
~70–200 bp Up to 20 Kbp | 120 bp |
| GC content | Avoid very low (<30%) or very high (>70%) areas | 40%–55% |
| Tiling |
Tiling Mixed tiling/no tiling No tiling | Tiling for areas of interest/No tiling for random areas |
| Other considerations |
Sequence binding (ΔG) Melting temperature (Tm) BLAST hits |
FIGURE 3(a) Number of baits with more than one read on target, per species (x‐axis) and category explored (modern and historical, y‐axis). (b) Mean number of reads per bait, per species (x‐axis) and category explored (modern and historical, y‐axis). Black lines in (a) and (b) correspond to the median of the samples. (c) Cumulative distribution that describes the fraction of targeted bp covered by a certain number of reads (x‐axis, represented by depth); each coloured line represents an individual from each population and category explored
FIGURE 4Analysis of the speed at which supeRbaits loads different genomic resources and retrieves baits. (a) Total time spent to import each of the three genomic databases (Atlantic cod, Gadus morhua; tiger shark, Galeocerdo cuvier, and Atlantic salmon, Salmo salar). (b) Average kBP counted per second for each of the genomic databases. (c) Total time required to choose bait locations and extract the respective number of baits from the genomic database, tested with basic conditions
Main arguments of the supeRbaits main function
| Argument name | Description |
|---|---|
|
| Total number of desired baits |
| Size | Length (in bp) of each bait |
| Database | Genomic reference |
| n_per_seq | Number of baits per each sequence in the database |
| min_per_seq | Minimum number of baits per each sequence in the database |
| Exclusions | Areas of the database to exclude |
| Regions | Specific areas of the database to include |
| Regions.tiling | Choice of tiling for baits allocated in regions |
| Regions.prop | Proportion of baits allocated in regions |
| Targets | Specific points of the database to include (e.g., SNPs) |
| Targets.tiling | Choice of tiling for baits allocated in targets |
| Targets.prop | Proportion of baits allocated in targets |
| Seed | Seed to be set for a repeatable set of baits |
| Restrict | Areas of the database to restrict the baits to |
| gc | Wished range of the proportion of the nucleotides G and C within the bait area |
| force | Option to request a very large number of baits to be generated |