Literature DB >> 31150060

Augmented Interval List: a novel data structure for efficient genomic interval search.

Jianglin Feng1, Aakrosh Ratan1,2,3, Nathan C Sheffield1,2,3,4.   

Abstract

MOTIVATION: Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary.
RESULTS: We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5-18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4-60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis.
AVAILABILITY AND IMPLEMENTATION: An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2019        PMID: 31150060      PMCID: PMC6901075          DOI: 10.1093/bioinformatics/btz407

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  8 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  Galaxy: a platform for interactive large-scale genome analysis.

Authors:  Belinda Giardine; Cathy Riemer; Ross C Hardison; Richard Burhans; Laura Elnitski; Prachi Shah; Yi Zhang; Daniel Blankenberg; Istvan Albert; James Taylor; Webb Miller; W James Kent; Anton Nekrutenko
Journal:  Genome Res       Date:  2005-09-16       Impact factor: 9.043

3.  Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.

Authors:  Alexander V Alekseyenko; Christopher J Lee
Journal:  Bioinformatics       Date:  2007-01-18       Impact factor: 6.937

4.  fjoin: simple and efficient computation of feature overlaps.

Authors:  Joel E Richardson
Journal:  J Comput Biol       Date:  2006-10       Impact factor: 1.479

5.  BEDOPS: high-performance genomic feature operations.

Authors:  Shane Neph; M Scott Kuehn; Alex P Reynolds; Eric Haugen; Robert E Thurman; Audra K Johnson; Eric Rynes; Matthew T Maurano; Jeff Vierstra; Sean Thomas; Richard Sandstrom; Richard Humbert; John A Stamatoyannopoulos
Journal:  Bioinformatics       Date:  2012-05-09       Impact factor: 6.937

6.  GIGGLE: a search engine for large-scale integrated genome analysis.

Authors:  Ryan M Layer; Brent S Pedersen; Tonya DiSera; Gabor T Marth; Jason Gertz; Aaron R Quinlan
Journal:  Nat Methods       Date:  2018-01-08       Impact factor: 28.547

7.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

  8 in total
  5 in total

1.  Bedshift: perturbation of genomic interval sets.

Authors:  Aaron Gu; Hyun Jae Cho; Nathan C Sheffield
Journal:  Genome Biol       Date:  2021-08-20       Impact factor: 13.583

2.  Bedtk: finding interval overlap with implicit interval tree.

Authors:  Heng Li; Jiazhen Rong
Journal:  Bioinformatics       Date:  2021-06-09       Impact factor: 6.937

3.  Ultrafast and scalable variant annotation and prioritization with big functional genomics data.

Authors:  Dandan Huang; Xianfu Yi; Yao Zhou; Hongcheng Yao; Hang Xu; Jianhua Wang; Shijie Zhang; Wenyan Nong; Panwen Wang; Lei Shi; Chenghao Xuan; Miaoxin Li; Junwen Wang; Weidong Li; Hoi Shan Kwan; Pak Chung Sham; Kai Wang; Mulin Jun Li
Journal:  Genome Res       Date:  2020-10-15       Impact factor: 9.043

4.  Seqpare: a novel metric of similarity between genomic interval sets.

Authors:  Selena C Feng; Nathan C Sheffield; Jianglin Feng
Journal:  F1000Res       Date:  2020-06-09

5.  GenomicDistributions: fast analysis of genomic intervals with Bioconductor.

Authors:  Kristyna Kupkova; Jose Verdezoto Mosquera; Jason P Smith; Michał Stolarczyk; Tessa L Danehy; John T Lawson; Bingjie Xue; John T Stubbs; Nathan LeRoy; Nathan C Sheffield
Journal:  BMC Genomics       Date:  2022-04-12       Impact factor: 3.969

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.