Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A parallel algorithm for N-way interval set intersection.

Literature DB >> 30333632

A parallel algorithm for N-way interval set intersection.

Abstract

The comparison of sets of genome intervals (e.g., genes, repeats, ChIP-seq peaks) is essential to genome research, especially as modern sequencing technologies enable ever larger and more complex experiments. Relationships between genomic features are commonly identified by their intersection: that is, if feature sets contain overlapping intervals then it is inferred that they share a common biological function or origin. Using this technique, researchers identify genomic regions that are common among multiple (or unique to individual) datasets. While there have been recent advances in algorithms for pairwise intersections between two sets of genomic intervals, few advances have been made to the intersection of many sets of genomic intervals. Identifying intersections among many interval sets is particularly important when attempting to distill biological insights from the massive, multi-dimensional datasets that are common to modern genome research. For such analyses, speed and efficiency are crucial given the size and sheer number of datasets involved. To solve this problem, we present a novel "slice-then-sweep" algorithm that, given N interval sets, efficiently reveals the subset of intervals that are common to all N sets. We demonstrate that our algorithm is more efficient in the sequential case and has a vastly higher capacity for parallelization with a 19x speedup over the existing algorithm.

Entities: CellLine Chemical Disease Gene Species

Keywords: Genomic interval intersection; bioinformatics; computational biology; genome analysis; parallel algorithm

Year: 2017 PMID： 30333632 PMCID： PMC6188649 DOI： 10.1109/JPROC.2015.2461494

Source DB: PubMed Journal: Proc IEEE Inst Electr Electron Eng ISSN： 0018-9219 Impact factor: 10.961

15 in total

1. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

2. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS).

Authors: Gregory E Crawford; Ingeborg E Holt; James Whittle; Bryn D Webb; Denise Tai; Sean Davis; Elliott H Margulies; YiDong Chen; John A Bernat; David Ginsburg; Daixing Zhou; Shujun Luo; Thomas J Vasicek; Mark J Daly; Tyra G Wolfsberg; Francis S Collins
Journal: Genome Res Date: 2005-12-12 Impact factor: 9.043

3. Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.

Authors: Alexander V Alekseyenko; Christopher J Lee
Journal: Bioinformatics Date: 2007-01-18 Impact factor: 6.937

4. fjoin: simple and efficient computation of feature overlaps.

Authors: Joel E Richardson
Journal: J Comput Biol Date: 2006-10 Impact factor: 1.479

5. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing.

Authors: Ryan Morin; Matthew Bainbridge; Anthony Fejes; Martin Hirst; Martin Krzywinski; Trevor Pugh; Helen McDonald; Richard Varhol; Steven Jones; Marco Marra
Journal: Biotechniques Date: 2008-07 Impact factor: 1.993

6. Binary Interval Search: a scalable algorithm for counting interval intersections.

Authors: Ryan M Layer; Kevin Skadron; Gabriel Robins; Ira M Hall; Aaron R Quinlan
Journal: Bioinformatics Date: 2012-11-04 Impact factor: 6.937

7. The NIH Roadmap Epigenomics Mapping Consortium.

Authors: Bradley E Bernstein; John A Stamatoyannopoulos; Joseph F Costello; Bing Ren; Aleksandar Milosavljevic; Alexander Meissner; Manolis Kellis; Marco A Marra; Arthur L Beaudet; Joseph R Ecker; Peggy J Farnham; Martin Hirst; Eric S Lander; Tarjei S Mikkelsen; James A Thomson
Journal: Nat Biotechnol Date: 2010-10 Impact factor: 54.908

A parallel algorithm for N-way interval set intersection.

1. The human genome browser at UCSC.

2. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS).

3. Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.

4. fjoin: simple and efficient computation of feature overlaps.

5. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing.

6. Binary Interval Search: a scalable algorithm for counting interval intersections.

7. The NIH Roadmap Epigenomics Mapping Consortium.

8. Genome-wide mapping of in vivo protein-DNA interactions.

9. A user's guide to the encyclopedia of DNA elements (ENCODE).

10. An integrated encyclopedia of DNA elements in the human genome.

1. JOA: Joint Overlap Analysis of multiple genomic interval sets.

2. Vcfanno: fast, flexible annotation of genetic variants.