| Literature DB >> 23442375 |
Dominik Grimm1, Jörg Hagmann, Daniel Koenig, Detlef Weigel, Karsten Borgwardt.
Abstract
BACKGROUND: One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives.Entities:
Mesh:
Year: 2013 PMID: 23442375 PMCID: PMC3614465 DOI: 10.1186/1471-2164-14-132
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Description and categorization of features. The first category of features includes deletion candidates only, whereas categories 2-4 contain deletions and insertions.
Figure 2Boxplots showing the performance for different feature sets. The first 2 boxplots show the AUC and Spec-Sens-BEP of the single f1 feature for deletions, the next two for insertions. The last four boxplots show the AUC and Spec-Sens-BEP using a set of 13 features for insertions (f13) and 17 features for deletions (f17).
Figure 3Feature contributions. Learned weights of a linear SVM of 17 features for deletions (A) and 13 features for insertions (B). A positive weight contributes to the support of an indel, whereas a negative weight contributes to the rejection.
Figure 4Population structure of 80 accessions of the first phase of the 1001 genomes project. (A) First two principle components (PC1 and PC2) of the covariance matrix of all positive predicted indels by our indel detection tool for 80 strains of Arabidopsis thaliana. (B) First two PCs of all detected indels by the tool Pindel (v0.1) and (C) by Pindel (v0.24).