| Literature DB >> 24915764 |
Adam C English1, William J Salerno, Jeffrey G Reid.
Abstract
BACKGROUND: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads.Entities:
Mesh:
Year: 2014 PMID: 24915764 PMCID: PMC4082283 DOI: 10.1186/1471-2105-15-180
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Tail Schematic. Schematic of possible tails created by reads representing a deletion (a) and an inversion (c) allele and the structural variants they represent when mapped to the reference (b). Rectangles represent double-stranded genomic sequence. Arrows above and below a rectangle represent reads mapping to the direct and complement strands, respectively. In these examples, all initial alignments align at the 5’ breakpoint of the reference. The read spanning a deletion event creates an epilog that maps to the 3’ breakpoint on the same strand as its corresponding initial alignment. The reads spanning the inversion event breakpoints create prologs that map on the opposite strands of their corresponding initial alignment. While all three piece-alignments would cluster if we considered only their location, their orientations support two separate events in the reference region.
Figure 2Simulated ALU Deletion. Plot (a) depicts the raw channels for the 327 bp ALU Deletion. Raw channels include coverage (COV), mismatches (MIS), insertions (INS), and deletions (DEL). Plot (b) are the channels after smoothing, and plot (c) is the final signal after applying the slope kernel. The gray lines represent the start and end points of the deletion.
Performance over 50 down-sampling experiments at 10X, and 20X coverage
| TP | 210 | 182 | 187 | 217 | 210 | 234 | 239 | 228 | 237 | 236 |
| FP | 25 | 0 | 7 | 55 | 302 | 1 | 0 | 1 | 3 | 46 |
| FN | 40 | 68 | 63 | 40 | 38 | 16 | 11 | 22 | 13 | 14 |
| Sensitivity | 84.00% | 72.80% | 74.80% | 86.80% | 84.00% | 93.60% | 95.60% | 91.20% | 94.80% | 94.40% |
| PPV | 89.36% | 100.00% | 96.39% | 79.78% | 41.02% | 99.57% | 100.00% | 99.56% | 98.75% | 83.69% |
True Positive (TP) False Positive (FP) False Negative (FN) counts, Sensitivity, and Positive Predictive Value (PPV). PPV is the probability any given variant call is true (TP/(TP+FP)). Parameters changed are Coverage (c) and Standard-deviation Threshold (e) for spots signal processing.