Jin Zhang1, Yufeng Wu. 1. Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA. jinzhang@engr.uconn.edu
Abstract
MOTIVATION: Structural variation (SV), such as deletion, is an important type of genetic variation and may be associated with diseases. While there are many existing methods for detecting SVs, finding deletions is still challenging with low-coverage short sequence reads. Existing deletion finding methods for sequence reads either use the so-called split reads mapping for detecting deletions with exact breakpoints, or rely on discordant insert sizes to estimate approximate positions of deletions. Neither is completely satisfactory with low-coverage sequence reads. RESULTS: We present SVseq, an efficient two-stage approach, which combines the split reads mapping and discordant insert size analysis. The first stage is split reads mapping based on the Burrows-Wheeler transform (BWT), which finds candidate deletions. Our split reads mapping method allows mismatches and small indels, thus deletions near other small variations can be discovered and reads with sequencing errors can be utilized. The second stage filters the false positives by analyzing discordant insert sizes. SVseq is more accurate than an alternative approach when applying on simulated data and empirical data, and is also much faster. AVAILABILITY: The program SVseq can be downloaded at http://www.engr.uconn.edu/~jiz08001/ CONTACT: jinzhang@engr.uconn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Structural variation (SV), such as deletion, is an important type of genetic variation and may be associated with diseases. While there are many existing methods for detecting SVs, finding deletions is still challenging with low-coverage short sequence reads. Existing deletion finding methods for sequence reads either use the so-called split reads mapping for detecting deletions with exact breakpoints, or rely on discordant insert sizes to estimate approximate positions of deletions. Neither is completely satisfactory with low-coverage sequence reads. RESULTS: We present SVseq, an efficient two-stage approach, which combines the split reads mapping and discordant insert size analysis. The first stage is split reads mapping based on the Burrows-Wheeler transform (BWT), which finds candidate deletions. Our split reads mapping method allows mismatches and small indels, thus deletions near other small variations can be discovered and reads with sequencing errors can be utilized. The second stage filters the false positives by analyzing discordant insert sizes. SVseq is more accurate than an alternative approach when applying on simulated data and empirical data, and is also much faster. AVAILABILITY: The program SVseq can be downloaded at http://www.engr.uconn.edu/~jiz08001/ CONTACT: jinzhang@engr.uconn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Biao Liu; Jeffrey M Conroy; Carl D Morrison; Adekunle O Odunsi; Maochun Qin; Lei Wei; Donald L Trump; Candace S Johnson; Song Liu; Jianmin Wang Journal: Oncotarget Date: 2015-03-20
Authors: Steven N Hart; Vivekananda Sarangi; Raymond Moore; Saurabh Baheti; Jaysheel D Bhavsar; Fergus J Couch; Jean-Pierre A Kocher Journal: PLoS One Date: 2013-12-16 Impact factor: 3.240
Authors: Sean R Landman; Tae Hyun Hwang; Kevin A T Silverstein; Yingming Li; Scott M Dehm; Michael Steinbach; Vipin Kumar Journal: BMC Genomics Date: 2014-01-29 Impact factor: 3.969