Literature DB >> 17234640

Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.

Alexander V Alekseyenko1, Christopher J Lee.   

Abstract

MOTIVATION: The exponential growth of sequence databases poses a major challenge to bioinformatics tools for querying alignment and annotation databases. There is a pressing need for methods for finding overlapping sequence intervals that are highly scalable to database size, query interval size, result size and construction/updating of the interval database.
RESULTS: We have developed a new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set. In all cases tested, this query algorithm is 5-500-fold faster than other indexing methods tested in this study, such as MySQL multi-column indexing, MySQL binning and R-Tree indexing. We provide performance comparisons both in simulated datasets and real-world genome alignment databases, across a wide range of database sizes and query interval widths. We also present an in-place NCList construction algorithm that yields database construction times that are approximately 100-fold faster than other methods available. The NCList data structure appears to provide a useful foundation for highly scalable interval database applications. AVAILABILITY: NCList data structure is part of Pygr, a bioinformatics graph database library, available at http://sourceforge.net/projects/pygr

Mesh:

Year:  2007        PMID: 17234640     DOI: 10.1093/bioinformatics/btl647

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  22 in total

1.  Visualizing next-generation sequencing data with JBrowse.

Authors:  Oscar Westesson; Mitchell Skinner; Ian Holmes
Journal:  Brief Bioinform       Date:  2012-03-12       Impact factor: 11.622

2.  Tabix: fast retrieval of sequence features from generic TAB-delimited files.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-01-05       Impact factor: 6.937

3.  Retrotransposon profiling of RNA polymerase III initiation sites.

Authors:  Xiaojie Qi; Kenneth Daily; Kim Nguyen; Haoyi Wang; David Mayhew; Paul Rigor; Sholeh Forouzan; Mark Johnston; Robi David Mitra; Pierre Baldi; Suzanne Sandmeyer
Journal:  Genome Res       Date:  2012-01-27       Impact factor: 9.043

4.  JBrowse: a next-generation genome browser.

Authors:  Mitchell E Skinner; Andrew V Uzilov; Lincoln D Stein; Christopher J Mungall; Ian H Holmes
Journal:  Genome Res       Date:  2009-07-01       Impact factor: 9.043

5.  Binary Interval Search: a scalable algorithm for counting interval intersections.

Authors:  Ryan M Layer; Kevin Skadron; Gabriel Robins; Ira M Hall; Aaron R Quinlan
Journal:  Bioinformatics       Date:  2012-11-04       Impact factor: 6.937

6.  PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci.

Authors:  Mali Salmon-Divon; Heidi Dvinge; Kairi Tammoja; Paul Bertone
Journal:  BMC Bioinformatics       Date:  2010-08-06       Impact factor: 3.169

7.  BigWig and BigBed: enabling browsing of large distributed datasets.

Authors:  W J Kent; A S Zweig; G Barber; A S Hinrichs; D Karolchik
Journal:  Bioinformatics       Date:  2010-07-17       Impact factor: 6.937

8.  A parallel algorithm for N-way interval set intersection.

Authors:  Ryan M Layer; Aaron R Quinlan
Journal:  Proc IEEE Inst Electr Electron Eng       Date:  2017-03       Impact factor: 10.961

9.  Bedtk: finding interval overlap with implicit interval tree.

Authors:  Heng Li; Jiazhen Rong
Journal:  Bioinformatics       Date:  2021-06-09       Impact factor: 6.937

10.  Predicting functional alternative splicing by measuring RNA selection pressure from multigenome alignments.

Authors:  Hongchao Lu; Lan Lin; Seiko Sato; Yi Xing; Christopher J Lee
Journal:  PLoS Comput Biol       Date:  2009-12-18       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.