MOTIVATION: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map. RESULTS: Here we present a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant. AVAILABILITY: SplazerS is available from http://www.seqan.de/projects/ splazers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map. RESULTS: Here we present a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant. AVAILABILITY: SplazerS is available from http://www.seqan.de/projects/ splazers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Joshua N Honeyman; Elana P Simon; Nicolas Robine; Rachel Chiaroni-Clarke; David G Darcy; Irene Isabel P Lim; Caroline E Gleason; Jennifer M Murphy; Brad R Rosenberg; Lydia Teegan; Constantin N Takacs; Sergio Botero; Rachel Belote; Soren Germer; Anne-Katrin Emde; Vladimir Vacic; Umesh Bhanot; Michael P LaQuaglia; Sanford M Simon Journal: Science Date: 2014-02-28 Impact factor: 47.728