Rebecca Elyanow1, Hsin-Ta Wu1, Benjamin J Raphael2. 1. Center for Computational Molecular Biology, Brown University, Providence, RI, USA. 2. Department of Computer Science, Princeton University, Princeton, NJ, USA.
Abstract
MOTIVATION: Structural variation, including large deletions, duplications, inversions, translocations and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (∼5 to 10) DNA molecules ∼50 Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants. RESULTS: We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in an individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification-including two recent methods that also analyze linked-reads-on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes. AVAILABILITY AND IMPLEMENTATION: Software is available at compbio.cs.brown.edu/software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Structural variation, including large deletions, duplications, inversions, translocations and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (∼5 to 10) DNA molecules ∼50 Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants. RESULTS: We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in an individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification-including two recent methods that also analyze linked-reads-on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes. AVAILABILITY AND IMPLEMENTATION: Software is available at compbio.cs.brown.edu/software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Dmitry Meleshko; Rui Yang; Patrick Marks; Stephen Williams; Iman Hajirasouliha Journal: Nucleic Acids Res Date: 2022-10-14 Impact factor: 19.160
Authors: Christian Thomas; Patrick Soschinski; Melissa Zwaig; Spyridon Oikonomopoulos; Konstantin Okonechnikov; Kristian W Pajtler; Martin Sill; Leonille Schweizer; Arend Koch; Julia Neumann; Ulrich Schüller; Felix Sahm; Laurèl Rauschenbach; Kathy Keyvani; Martin Proescholdt; Markus J Riemenschneider; Jochen Segewiß; Christian Ruckert; Oliver Grauer; Camelia-Maria Monoranu; Katrin Lamszus; Annarita Patrizi; Uwe Kordes; Reiner Siebert; Marcel Kool; Jiannis Ragoussis; William D Foulkes; Werner Paulus; Barbara Rivera; Martin Hasselblatt Journal: Neuro Oncol Date: 2021-04-12 Impact factor: 12.300
Authors: Benjamin A Sandkam; Pedro Almeida; Iulia Darolti; Benjamin L S Furman; Wouter van der Bijl; Jake Morris; Godfrey R Bourne; Felix Breden; Judith E Mank Journal: Nat Ecol Evol Date: 2021-05-06 Impact factor: 15.460
Authors: Chen Cao; Jingni He; Lauren Mak; Deshan Perera; Devin Kwok; Jia Wang; Minghao Li; Tobias Mourier; Stefan Gavriliuc; Matthew Greenberg; A Sorana Morrissy; Laura K Sycuro; Guang Yang; Daniel C Jeffares; Quan Long Journal: Mol Biol Evol Date: 2021-05-19 Impact factor: 16.240
Authors: Tomas N Generalovic; Shane A McCarthy; Ian A Warren; Jonathan M D Wood; James Torrance; Ying Sims; Michael Quail; Kerstin Howe; Miha Pipan; Richard Durbin; Chris D Jiggins Journal: G3 (Bethesda) Date: 2021-05-07 Impact factor: 3.154
Authors: Jesper Eisfeldt; Maria Pettersson; Francesco Vezzi; Josephine Wincent; Max Käller; Joel Gruselius; Daniel Nilsson; Elisabeth Syk Lundberg; Claudia M B Carvalho; Anna Lindstrand Journal: PLoS Genet Date: 2019-02-08 Impact factor: 5.917