| Literature DB >> 25128977 |
Giuseppe Narzisi1, Jason A O'Rawe2, Ivan Iossifov3, Han Fang2, Yoon-Ha Lee3, Zihua Wang3, Yiyang Wu2, Gholson J Lyon2, Michael Wigler3, Michael C Schatz3.
Abstract
We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpel's power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25128977 PMCID: PMC4180789 DOI: 10.1038/nmeth.3069
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1Overview of the Scalpel algorithm work flow
Extracted reads include: well-mapped reads, soft-clipped reads, and reads that fail to map, but are anchored by their mate. The assembled sequences are aligned to a reference using the standard Smith-Waterman-Gotoh alignment algorithm with affine gap penalties.
Figure 2Concordance of indels between pipelines
(a) Venn Diagram showing the percentage of indels shared between the three pipelines. (b) Size distribution for indels called by each pipeline. 1,000 indels from five categories were analyzed by focused resequencing.
Figure 3MiSeq validation
Ratio of valid (green) and false (grey) indel calls based on position-based matches (a) or exact matches (b) for the indicated tools, for indels of size ≥ 30bp from the union of the mutations detected by all three pipelines (“Long Indels”), and for indels in the intersection (“Intersection”). Validation for all indels (“All indels”), validation only for indels within microsatellites (“SSRs-only”), and validation for indels that are not within microsatellites (“No-SSRs”). (c) Stacked histogram of validation rate by indel size for each variant caller.