Ramesh Rajaby1,2, Wing-Kin Sung1,3. 1. School of Computing, National University of Singapore, 13 Computing Drive, Singapore. 2. NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, Singapore. 3. Genome Institute of Singapore, 60 Biopolis Street, Genome, Singapore.
Abstract
MOTIVATION: Structural variations (SV) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome.Since paired-end whole genome sequencing data has become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data. RESULTS: We developed a novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. We show that SurVIndel outperforms existing methods on both simulated and real biological datasets. AVAILABILITY: SurVIndel is available at https://github.com/Mesh89/SurVIndel.
MOTIVATION: Structural variations (SV) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome.Since paired-end whole genome sequencing data has become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data. RESULTS: We developed a novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. We show that SurVIndel outperforms existing methods on both simulated and real biological datasets. AVAILABILITY: SurVIndel is available at https://github.com/Mesh89/SurVIndel.