Literature DB >> 33573603

S-conLSH: alignment-free gapped mapping of noisy long reads.

Angana Chakraborty1, Burkhard Morgenstern2, Sanghamitra Bandyopadhyay3.   

Abstract

BACKGROUND: The advancement of SMRT technology has unfolded new opportunities of genome analysis with its longer read length and low GC bias. Alignment of the reads to their appropriate positions in the respective reference genome is the first but costliest step of any analysis pipeline based on SMRT sequencing. However, the state-of-the-art aligners often fail to identify distant homologies due to lack of conserved regions, caused by frequent genetic duplication and recombination. Therefore, we developed a novel alignment-free method of sequence mapping that is fast and accurate.
RESULTS: We present a new mapper called S-conLSH that uses Spaced context based Locality Sensitive Hashing. With multiple spaced patterns, S-conLSH facilitates a gapped mapping of noisy long reads to the corresponding target locations of a reference genome. We have examined the performance of the proposed method on 5 different real and simulated datasets. S-conLSH is at least 2 times faster than the recently developed method lordFAST. It achieves a sensitivity of 99%, without using any traditional base-to-base alignment, on human simulated sequence data. By default, S-conLSH provides an alignment-free mapping in PAF format. However, it has an option of generating aligned output as SAM-file, if it is required for any downstream processing.
CONCLUSIONS: S-conLSH is one of the first alignment-free reference genome mapping tools achieving a high level of sensitivity. The spaced-context is especially suitable for extracting distant similarities. The variable-length spaced-seeds or patterns add flexibility to the proposed algorithm by introducing gapped mapping of the noisy long reads. Therefore, S-conLSH may be considered as a prominent direction towards alignment-free sequence analysis.

Entities:  

Keywords:  Alignment-free sequence comparison; Locality sensitive hashing; Noisy long SMRT reads; Sequence analysis

Mesh:

Year:  2021        PMID: 33573603      PMCID: PMC7879691          DOI: 10.1186/s12859-020-03918-3

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  2 in total

1.  PatternHunter II: highly sensitive and fast homology search.

Authors:  Ming Li; Bin Ma; Derek Kisman; John Tromp
Journal:  Genome Inform       Date:  2003

2.  Fast and accurate phylogeny reconstruction using filtered spaced-word matches.

Authors:  Chris-André Leimeister; Salma Sohrabi-Jahromi; Burkhard Morgenstern
Journal:  Bioinformatics       Date:  2017-04-01       Impact factor: 6.937

  2 in total
  3 in total

Review 1.  Nanopore sequencing technology, bioinformatics and applications.

Authors:  Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal:  Nat Biotechnol       Date:  2021-11-08       Impact factor: 54.908

2.  kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph.

Authors:  Ze-Gang Wei; Xing-Guo Fan; Hao Zhang; Xiao-Dan Zhang; Fei Liu; Yu Qian; Shao-Wu Zhang
Journal:  Front Genet       Date:  2022-05-05       Impact factor: 4.772

Review 3.  From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.

Authors:  Mohammed Alser; Joel Lindegger; Can Firtina; Nour Almadhoun; Haiyu Mao; Gagandeep Singh; Juan Gomez-Luna; Onur Mutlu
Journal:  Comput Struct Biotechnol J       Date:  2022-08-18       Impact factor: 6.155

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.