| Literature DB >> 27892959 |
Zechen Chong1, Jue Ruan2, Min Gao3, Wanding Zhou1, Tenghui Chen1, Xian Fan1, Li Ding4, Anna Y Lee5, Paul Boutros5,6,7, Junjie Chen3, Ken Chen1.
Abstract
We present novoBreak, a genome-wide local assembly algorithm that discovers somatic and germline structural variation breakpoints in whole-genome sequencing data. novoBreak consistently outperformed existing algorithms on real cancer genome data and on synthetic tumors in the ICGC-TCGA DREAM 8.5 Somatic Mutation Calling Challenge primarily because it more effectively utilized reads spanning breakpoints. novoBreak also demonstrated great sensitivity in identifying short insertions and deletions.Entities:
Mesh:
Year: 2016 PMID: 27892959 PMCID: PMC5199621 DOI: 10.1038/nmeth.4084
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1The workflow of novoBreak algorithm
(a) Short paired-end tumor reads (pairs of grey and black bars connected by dashed lines) are dissected into constituent k-mers. Indexed k-mers are compared against the reference sequence and normal reads. Only somatic novo k-mers (red bars) unique in the tumor genome are kept, while germline (green bar) and reference k-mers (grey bars) are filtered out. (b) The cluster of reads spanning a breakpoint i are found in conjunction with a set of shared novo k-mers. A long contig (grey bar) containing a unique breakpoint sequence in the middle (highlighted in red) is assembled from the cluster of reads. (c) This assembled contig is aligned (dashed line) against the reference sequence to infer exact breakpoint and associated SV. (d) Each SV breakpoint is scored, ranked and output in a standard variant call format (VCF) file.
Figure 2novoBreak performance
in the IS3 data. (a) Precision and recall comparison among 3 top-performing tools: novoBreak (green), DELLY (blue) and Manta (red). Star indicates the best scoring results of each tool. (b) Comparison of breakpoint precision among the 3 tools. X-axis is the offset (in base pair) between the true and predicted breakpoint coordinates. Y-axis is the fraction of predicted breakpoints at each of the offset values. (c) INDEL detection sensitivity of GATK-Haplotypecaller, Strelka and novoBreak in the IS4 data. (d) Summary of SV breakpoints detected in COLO-829 data by novoBreak, BreakDancer, DELLY and Fermi.