Literature DB >> 20663848

RepMaestro: scalable repeat detection on disk-based genome sequences.

Nikolas Askitis1, Ranjan Sinha.   

Abstract

MOTIVATION: We investigate the problem of exact repeat detection on large genomic sequences. Most existing approaches based on suffix trees and suffix arrays (SAs) are limited either to small sequences or those that are memory resident. We introduce RepMaestro, a software that adapts existing in-memory-enhanced SA algorithms to enable them to scale efficiently to large sequences that are disk resident. Supermaximal repeats, maximal unique matches (MuMs) and pairwise branching tandem repeats have been used to demonstrate the practicality of our approach; the first such study to use an enhanced SA to detect these repeats in large genome sequences.
RESULTS: The detection of supermaximal repeats was observed to be up to two times faster than Vmatch, but more importantly, was shown to scale efficiently to large genome sequences that Vmatch could not process due to memory constraints (4 GB). Similar results were observed for the detection of MuMs, with RepMaestro shown to scale well and also perform up to six times faster than Vmatch. For tandem repeats, RepMaestro was found to be slower but could nonetheless scale to large disk-resident sequences. These results are a significant advance in the quest of scalable repeat detection. Software availability: RepMaestro is available at http://www.naskitis.com.

Entities:  

Mesh:

Year:  2010        PMID: 20663848     DOI: 10.1093/bioinformatics/btq433

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  3 in total

1.  gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections.

Authors:  Felipe A Louza; Guilherme P Telles; Simon Gog; Nicola Prezza; Giovanna Rosone
Journal:  Algorithms Mol Biol       Date:  2020-09-22       Impact factor: 1.405

2.  Identification of evolutionary relationships and DNA markers in the medicinally important genus Fritillaria based on chloroplast genomics.

Authors:  Tian Zhang; Sipei Huang; Simin Song; Meng Zou; Tiechui Yang; Weiwei Wang; Jiayu Zhou; Hai Liao
Journal:  PeerJ       Date:  2021-12-16       Impact factor: 2.984

Review 3.  Prospects and limitations of full-text index structures in genome analysis.

Authors:  Michaël Vyverman; Bernard De Baets; Veerle Fack; Peter Dawyndt
Journal:  Nucleic Acids Res       Date:  2012-05-13       Impact factor: 16.971

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.