Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Augmented Interval List: a novel data structure for efficient genomic interval search.

Literature DB >> 31150060

Augmented Interval List: a novel data structure for efficient genomic interval search.

Jianglin Feng¹, Aakrosh Ratan^1,2,3, Nathan C Sheffield^1,2,3,4.

Abstract

MOTIVATION: Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary.
RESULTS: We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5-18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4-60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis.
AVAILABILITY AND IMPLEMENTATION: An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Year: 2019 PMID： 31150060 PMCID： PMC6901075 DOI： 10.1093/bioinformatics/btz407

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.931

8 in total

1. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

2. Galaxy: a platform for interactive large-scale genome analysis.

Authors: Belinda Giardine; Cathy Riemer; Ross C Hardison; Richard Burhans; Laura Elnitski; Prachi Shah; Yi Zhang; Daniel Blankenberg; Istvan Albert; James Taylor; Webb Miller; W James Kent; Anton Nekrutenko
Journal: Genome Res Date: 2005-09-16 Impact factor: 9.043

3. Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.

Authors: Alexander V Alekseyenko; Christopher J Lee
Journal: Bioinformatics Date: 2007-01-18 Impact factor: 6.937

4. fjoin: simple and efficient computation of feature overlaps.

Authors: Joel E Richardson
Journal: J Comput Biol Date: 2006-10 Impact factor: 1.479

5. BEDOPS: high-performance genomic feature operations.

Authors: Shane Neph; M Scott Kuehn; Alex P Reynolds; Eric Haugen; Robert E Thurman; Audra K Johnson; Eric Rynes; Matthew T Maurano; Jeff Vierstra; Sean Thomas; Richard Sandstrom; Richard Humbert; John A Stamatoyannopoulos
Journal: Bioinformatics Date: 2012-05-09 Impact factor: 6.937

6. GIGGLE: a search engine for large-scale integrated genome analysis.

Authors: Ryan M Layer; Brent S Pedersen; Tonya DiSera; Gabor T Marth; Jason Gertz; Aaron R Quinlan
Journal: Nat Methods Date: 2018-01-08 Impact factor: 28.547

7. BEDTools: a flexible suite of utilities for comparing genomic features.

Authors: Aaron R Quinlan; Ira M Hall
Journal: Bioinformatics Date: 2010-01-28 Impact factor: 6.937

8. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

8 in total

5 in total

1. Bedshift: perturbation of genomic interval sets.

Authors: Aaron Gu; Hyun Jae Cho; Nathan C Sheffield
Journal: Genome Biol Date: 2021-08-20 Impact factor: 13.583

2. Bedtk: finding interval overlap with implicit interval tree.

Authors: Heng Li; Jiazhen Rong
Journal: Bioinformatics Date: 2021-06-09 Impact factor: 6.937

3. Ultrafast and scalable variant annotation and prioritization with big functional genomics data.

Authors: Dandan Huang; Xianfu Yi; Yao Zhou; Hongcheng Yao; Hang Xu; Jianhua Wang; Shijie Zhang; Wenyan Nong; Panwen Wang; Lei Shi; Chenghao Xuan; Miaoxin Li; Junwen Wang; Weidong Li; Hoi Shan Kwan; Pak Chung Sham; Kai Wang; Mulin Jun Li
Journal: Genome Res Date: 2020-10-15 Impact factor: 9.043

4. Seqpare: a novel metric of similarity between genomic interval sets.

Authors: Selena C Feng; Nathan C Sheffield; Jianglin Feng
Journal: F1000Res Date: 2020-06-09

5. GenomicDistributions: fast analysis of genomic intervals with Bioconductor.

Authors: Kristyna Kupkova; Jose Verdezoto Mosquera; Jason P Smith; Michał Stolarczyk; Tessa L Danehy; John T Lawson; Bingjie Xue; John T Stubbs; Nathan LeRoy; Nathan C Sheffield
Journal: BMC Genomics Date: 2022-04-12 Impact factor: 3.969

5 in total