Literature DB >> 34147063

SVNN: an efficient PacBio-specific pipeline for structural variations calling using neural networks.

Shaya Akbarinejad1, Mostafa Hadadian Nejad Yousefi1, Maziar Goudarzi2.   

Abstract

BACKGROUND: Once aligned, long-reads can be a useful source of information to identify the type and position of structural variations. However, due to the high sequencing error of long reads, long-read structural variation detection methods are far from precise in low-coverage cases. To be accurate, they need to use high-coverage data, which in turn, results in an extremely time-consuming pipeline, especially in the alignment phase. Therefore, it is of utmost importance to have a structural variation calling pipeline which is both fast and precise for low-coverage data.
RESULTS: In this paper, we present SVNN, a fast yet accurate, structural variation calling pipeline for PacBio long-reads that takes raw reads as the input and detects structural variants of size larger than 50 bp. Our pipeline utilizes state-of-the-art long-read aligners, namely NGMLR and Minimap2, and structural variation callers, videlicet Sniffle and SVIM. We found that by using a neural network, we can extract features from Minimap2 output to detect a subset of reads that provide useful information for structural variation detection. By only mapping this subset with NGMLR, which is far slower than Minimap2 but better serves downstream structural variation detection, we can increase the sensitivity in an efficient way. As a result of using multiple tools intelligently, SVNN achieves up to 20 percentage points of sensitivity improvement in comparison with state-of-the-art methods and is three times faster than a naive combination of state-of-the-art tools to achieve almost the same accuracy.
CONCLUSION: Since prohibitive costs of using high-coverage data have impeded long-read applications, with SVNN, we provide the users with a much faster structural variation detection platform for PacBio reads with high precision and sensitivity in low-coverage scenarios.

Entities:  

Keywords:  Long reads; Neural networks; PacBio; Structural variation calling

Year:  2021        PMID: 34147063     DOI: 10.1186/s12859-021-04184-7

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  1 in total

Review 1.  Progress in Methods for Copy Number Variation Profiling.

Authors:  Veronika Gordeeva; Elena Sharova; Georgij Arapidi
Journal:  Int J Mol Sci       Date:  2022-02-15       Impact factor: 5.923

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.