| Literature DB >> 34435438 |
Melanie Lindner1, Fleur Gawehns1, Sebastiaan Te Molder1, Marcel E Visser1,2, Kees van Oers1, Veronika N Laine1,3.
Abstract
The profiling of epigenetic marks like DNA methylation has become a central aspect of studies in evolution and ecology. Bisulphite sequencing is commonly used for assessing genome-wide DNA methylation at single nucleotide resolution but these data can also provide information on genetic variants like single nucleotide polymorphisms (SNPs). However, bisulphite conversion causes unmethylated cytosines to appear as thymines, complicating the alignment and subsequent SNP calling. Several tools have been developed to overcome this challenge, but there is no independent evaluation of such tools for non-model species, which often lack genomic references. Here, we used whole-genome bisulphite sequencing (WGBS) data from four female great tits (Parus major) to evaluate the performance of seven tools for SNP calling from bisulphite sequencing data. We used SNPs from whole-genome resequencing data of the same samples as baseline SNPs to assess common performance metrics like sensitivity, precision, and the number of true positive, false positive, and false negative SNPs for the full range of variant and genotype quality values. We found clear differences between the tools in either optimizing precision (Bis-SNP), sensitivity (biscuit), or a compromise between both (all other tools). Overall, the choice of SNP caller strongly depends on which performance parameter should be maximized and whether ascertainment bias should be minimized to optimize downstream analysis, highlighting the need for studies that assess such differences.Entities:
Keywords: DNA methylation; great tit (Parus major); single nucleotide polymorphism; whole-genome sequencing
Mesh:
Substances:
Year: 2021 PMID: 34435438 PMCID: PMC9290141 DOI: 10.1111/1755-0998.13493
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 8.678
FIGURE 1Relationship between precision and sensitivity for SNPs called from whole‐genome bisulphite sequencing data of one sample (F3_E_BD_27272) relative to a list of known SNPs derived from whole‐genome resequencing data of the same sample. Precision and sensitivity were calculated using rtgtools with (a) QUAL and (b) QG as score fields, which means that the performance metrics (here precision and sensitivity) were calculated across the full range of a parameter values for QUAL and GQ. Thus, the number of data points per tool, varies with the tool‐specific and parameter‐specific range of parameter values. If the parameter value is not given, the performance metrics are calculated for the full SNP list (resulting in one data point) and if the full range of a parameter value is longer than 20 values, we reduced the length of a parameter range to 20 equally spaced values across the full range of parameter values. Here, only one sample is displayed, but see Figures S12–13 for plots with all samples
FIGURE 2Number of false negative (teal), false positive (green), and true positive (yellow) SNPs called (bars and left y‐axis) with the different tools tested for SNP calling from bisulphite sequencing data for one sample (F3_E_BD_27272). Performance metrics are based on the evaluation with rtgtools and we here show the performance metrics for which the f‐measure (a), sensitivity (b), and (c) precision is maximized when using QUAL as score fields (white diamonds and right y‐axis). Note that the QUAL values for which f‐measure, sensitivity, or precision are maximized differ between tools and that precision is maximized on the condition that at least 1,000,000 SNPs were called. Here, only one sample is displayed, but see Figures S14–16 for plots with all samples
FIGURE 3Distribution of SNPs over substitution contexts (alternative and reference allele) for the baseline list of true SNPs derived from whole‐genome resequencing data (a) and the tool‐specific lists of false positive SNPs (b). Samples are differentiated by colour (teal‐yellow) and plots in (b) have tool‐specific plot titles
Number of true positive SNPs (low‐high), number of false positive SNPs (low‐high), potential for (bisulphite‐induced) ascertainment bias (low‐high), and additional requirements for SNP calling for the seven tools tested
| Tool name (strategy) | Number of true positive SNP | Number of false positive SNP | Potential for ascertainment bias | Requirement |
|---|---|---|---|---|
|
|
|
|
|
|
|
| Low | Low | High | |
|
| High | High | High | |
|
| Medium | Medium | High | |
|
|
|
|
| |
|
| Medium | Medium | Medium | |
|
| Medium | Medium | High | |
|
| Medium | Medium | High | |
|
| Medium | Medium | High |
Abbreviation: BQ, base quality.