Literature DB >> 11791236

Indexing huge genome sequences for solving various problems.

K Sadakane1, T Shibuya.   

Abstract

Because of the increase in the size of genome sequence databases, the importance of indexing the sequences for fast queries grows. Suffix trees and suffix arrays are used for simple queries. However these are not suitable for complicated queries from huge amount of sequences because the indices are stored in disk which has slow access speed. We propose storing the indices in memory in a compressed form. We use the compressed suffix array. It compactly stores the suffix array at the cost of theoretically a small slowdown in access speed. We experimentally show that the overhead of using the compressed suffix array is reasonable in practice. We also propose an approximate string matching algorithm which is suitable for the compressed suffix array. Furthermore, we have constructed the compressed suffix array of the whole human genome. Because its size is about 2G bytes, a workstation can handle the search index for the whole data in main memory, which will accelerate the speed of solving various problems in genome informatics.

Entities:  

Mesh:

Year:  2001        PMID: 11791236

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  4 in total

1.  Refgenie: a reference genome resource manager.

Authors:  Michał Stolarczyk; Vincent P Reuter; Jason P Smith; Neal E Magee; Nathan C Sheffield
Journal:  Gigascience       Date:  2020-02-01       Impact factor: 6.524

Review 2.  Mappability and read length.

Authors:  Wentian Li; Jan Freudenberg
Journal:  Front Genet       Date:  2014-11-10       Impact factor: 4.599

Review 3.  Prospects and limitations of full-text index structures in genome analysis.

Authors:  Michaël Vyverman; Bernard De Baets; Veerle Fack; Peter Dawyndt
Journal:  Nucleic Acids Res       Date:  2012-05-13       Impact factor: 16.971

4.  Indexes of large genome collections on a PC.

Authors:  Agnieszka Danek; Sebastian Deorowicz; Szymon Grabowski
Journal:  PLoS One       Date:  2014-10-07       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.