Literature DB >> 24597945

DB2: a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads.

Gökhan Yavaş, Mehmet Koyutürk, Meetha P Gould, Sarah McMahon, Thomas LaFramboise1.   

Abstract

BACKGROUND: With the advent of paired-end high throughput sequencing, it is now possible to identify various types of structural variation on a genome-wide scale. Although many methods have been proposed for structural variation detection, most do not provide precise boundaries for identified variants. In this paper, we propose a new method, Distribution Based detection of Duplication Boundaries (DB2), for accurate detection of tandem duplication breakpoints, an important class of structural variation, with high precision and recall.
RESULTS: Our computational experiments on simulated data show that DB2 outperforms state-of-the-art methods in terms of finding breakpoints of tandem duplications, with a higher positive predictive value (precision) in calling the duplications' presence. In particular, DB2's prediction of tandem duplications is correct 99% of the time even for very noisy data, while narrowing down the space of possible breakpoints within a margin of 15 to 20 bps on the average. Most of the existing methods provide boundaries in ranges that extend to hundreds of bases with lower precision values. Our method is also highly robust to varying properties of the sequencing library and to the sizes of the tandem duplications, as shown by its stable precision, recall and mean boundary mismatch performance. We demonstrate our method's efficacy using both simulated paired-end reads, and those generated from a melanoma sample and two ovarian cancer samples. Newly discovered tandem duplications are validated using PCR and Sanger sequencing.
CONCLUSIONS: Our method, DB2, uses discordantly aligned reads, taking into account the distribution of fragment length to predict tandem duplications along with their breakpoints on a donor genome. The proposed method fine tunes the breakpoint calls by applying a novel probabilistic framework that incorporates the empirical fragment length distribution to score each feasible breakpoint. DB2 is implemented in Java programming language and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24597945      PMCID: PMC4234483          DOI: 10.1186/1471-2164-15-175

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  27 in total

1.  Reconstructing cancer genomes from paired-end sequencing data.

Authors:  Layla Oesper; Anna Ritz; Sarah J Aerni; Ryan Drebin; Benjamin J Raphael
Journal:  BMC Bioinformatics       Date:  2012-04-19       Impact factor: 3.169

2.  Sensitive and accurate detection of copy number variants using read depth of coverage.

Authors:  Seungtai Yoon; Zhenyu Xuan; Vladimir Makarov; Kenny Ye; Jonathan Sebat
Journal:  Genome Res       Date:  2009-08-05       Impact factor: 9.043

3.  Novel Tandem Duplication in Exon 1 of the SNURF/SNRPN Gene in a Child with Transient Excessive Eating Behaviour and Weight Gain.

Authors:  S Naik; N S Thomas; J H Davies; M Lever; M Raponi; D Baralle; I K Temple; A Caliebe
Journal:  Mol Syndromol       Date:  2012-01-04

4.  Detecting copy number variation with mated short reads.

Authors:  Paul Medvedev; Marc Fiume; Misko Dzamba; Tim Smith; Michael Brudno
Journal:  Genome Res       Date:  2010-08-30       Impact factor: 9.043

5.  Internal tandem duplication of the flt3 gene found in acute myeloid leukemia.

Authors:  M Nakao; S Yokota; T Iwai; H Kaneko; S Horiike; K Kashima; Y Sonoda; T Fujimoto; S Misawa
Journal:  Leukemia       Date:  1996-12       Impact factor: 11.528

6.  A comprehensive catalogue of somatic mutations from a human cancer genome.

Authors:  Erin D Pleasance; R Keira Cheetham; Philip J Stephens; David J McBride; Sean J Humphray; Chris D Greenman; Ignacio Varela; Meng-Lay Lin; Gonzalo R Ordóñez; Graham R Bignell; Kai Ye; Julie Alipaz; Markus J Bauer; David Beare; Adam Butler; Richard J Carter; Lina Chen; Anthony J Cox; Sarah Edkins; Paula I Kokko-Gonzales; Niall A Gormley; Russell J Grocock; Christian D Haudenschild; Matthew M Hims; Terena James; Mingming Jia; Zoya Kingsbury; Catherine Leroy; John Marshall; Andrew Menzies; Laura J Mudie; Zemin Ning; Tom Royce; Ole B Schulz-Trieglaff; Anastassia Spiridou; Lucy A Stebbings; Lukasz Szajkowski; Jon Teague; David Williamson; Lynda Chin; Mark T Ross; Peter J Campbell; David R Bentley; P Andrew Futreal; Michael R Stratton
Journal:  Nature       Date:  2009-12-16       Impact factor: 49.962

7.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing.

Authors:  Peter J Campbell; Philip J Stephens; Erin D Pleasance; Sarah O'Meara; Heng Li; Thomas Santarius; Lucy A Stebbings; Catherine Leroy; Sarah Edkins; Claire Hardy; Jon W Teague; Andrew Menzies; Ian Goodhead; Daniel J Turner; Christopher M Clee; Michael A Quail; Antony Cox; Clive Brown; Richard Durbin; Matthew E Hurles; Paul A W Edwards; Graham R Bignell; Michael R Stratton; P Andrew Futreal
Journal:  Nat Genet       Date:  2008-04-27       Impact factor: 38.330

8.  ALL-1 partial duplication in acute leukemia.

Authors:  S A Schichman; M A Caligiuri; Y Gu; M P Strout; E Canaani; C D Bloomfield; C M Croce
Journal:  Proc Natl Acad Sci U S A       Date:  1994-06-21       Impact factor: 11.205

9.  inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data.

Authors:  Ji Qi; Fangqing Zhao
Journal:  Nucleic Acids Res       Date:  2011-07       Impact factor: 16.971

10.  PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data.

Authors:  Jan O Korbel; Alexej Abyzov; Xinmeng Jasmine Mu; Nicholas Carriero; Philip Cayting; Zhengdong Zhang; Michael Snyder; Mark B Gerstein
Journal:  Genome Biol       Date:  2009-02-23       Impact factor: 13.583

View more
  3 in total

1.  PSE-HMM: genome-wide CNV detection from NGS data using an HMM with Position-Specific Emission probabilities.

Authors:  Seyed Amir Malekpour; Hamid Pezeshk; Mehdi Sadeghi
Journal:  BMC Bioinformatics       Date:  2016-11-03       Impact factor: 3.169

2.  MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.

Authors:  Seyed Amir Malekpour; Hamid Pezeshk; Mehdi Sadeghi
Journal:  Sci Rep       Date:  2018-03-05       Impact factor: 4.379

Review 3.  Current and Promising Approaches to Identify Horizontal Gene Transfer Events in Metagenomes.

Authors:  Gavin M Douglas; Morgan G I Langille
Journal:  Genome Biol Evol       Date:  2019-10-01       Impact factor: 3.416

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.