| Literature DB >> 31222198 |
Brent S Pedersen1,2, Aaron R Quinlan1,2,3.
Abstract
Most structural variant (SV) detection methods use clusters of discordant read-pair and split-read alignments to identify variants yet do not integrate depth of sequence coverage as an additional means to support or refute putative events. Here, we present "duphold," a new method to efficiently annotate SV calls with sequence depth information that can add (or remove) confidence to SVs that are predicted to affect copy number. Duphold indicates not only the change in depth across the event but also the presence of a rapid change in depth relative to the regions surrounding the break-points. It uses a unique algorithm that allows the run time to be nearly independent of the number of variants. This performance is important for large, jointly called projects with many samples, each of which must be evaluated at thousands of sites. We show that filtering on duphold annotations can greatly improve the specificity of SV calls. Duphold can annotate SV predictions made from both short-read and long-read sequencing datasets. It is available under the MIT license at https://github.com/brentp/duphold.Entities:
Keywords: algorithm; genomics; structural variation
Mesh:
Year: 2019 PMID: 31222198 PMCID: PMC6479422 DOI: 10.1093/gigascience/giz040
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Evaluating the accuracy of deletion calls filtered by duphold annotations
| Method | False discovery rate | False negative | False positive | True positive | Precision | Recall | F1 score |
|---|---|---|---|---|---|---|---|
| Unfiltered | 0.053 | 276 | 83 | 1,496 | 0.947 | 0.844 | 0.893 |
| DHBFC < 0.7 | 0.018 | 298 | 27 | 1,474 | 0.982 | 0.832 | 0.901 |
| DHFFC < 0.7 | 0.021 | 289 | 32 | 1,483 | 0.979 | 0.837 | 0.902 |
We evaluated deletion calls from LUMPY+svtyper using truvari.py [13] with the GiaB v0.6 truth set. DHBFC: duphold bin fold-change, which compares to regions (bins) of similar GC content. DHFFC: duphold flank fold-change (with 1,000 base flank). This shows that using either DHBFC < 0.7 or DHFFC < 0.7 as a filtering criterion for deletions increases precision, removing 61% [(83 – 32)/83] of false-positive calls while retaining >99% (1,483/1,496) of true-positive calls in the case of using DHFFC.
Figure 1:Evaluation of duphold on duplications and deletions of any size. We annotated 805 GiaB insertion calls as duplications and simulated homozygous reference (Hom. Ref.) events of similar size in order to evaluate the specificity and sensitivity of duphold. We show the distribution of DHFFC (duphold flank fold-change) for each genotype (homozygous reference [0/0] is blue, heterozygous [Het.] [0/1] is orange, and homozygous alternate [Hom. Alt.] [1/1] is green), for both duplications (A) and deletions (C). We then used those distributions to create receiver operating characteristic curves (B and D) and calculate AUCs that indicate the ability of duphold to differentiate 0/0 from 0/1 (orange) and 1/1 (green). The dots on the curves indicate a cutoff of 1.3 for duplications and 0.7 for deletions.
Figure 2:Duphold scalability. The time to annotate (or genotype) for duphold and svtyper is shown (y-axis) as a function of the number of variants tested (x-axis). While svtyper (blue) exhibits a linear increase in type with the number of variants, duphold is relatively independent of the number of variants. There is an initial cost that makes the duphold strategy less efficient for few (less than ∼10,000) variants, but it scales well to annotating thousands of variants as we expect for large cohorts.