Literature DB >> 30333632

A parallel algorithm for N-way interval set intersection.

Ryan M Layer1, Aaron R Quinlan2.   

Abstract

The comparison of sets of genome intervals (e.g., genes, repeats, ChIP-seq peaks) is essential to genome research, especially as modern sequencing technologies enable ever larger and more complex experiments. Relationships between genomic features are commonly identified by their intersection: that is, if feature sets contain overlapping intervals then it is inferred that they share a common biological function or origin. Using this technique, researchers identify genomic regions that are common among multiple (or unique to individual) datasets. While there have been recent advances in algorithms for pairwise intersections between two sets of genomic intervals, few advances have been made to the intersection of many sets of genomic intervals. Identifying intersections among many interval sets is particularly important when attempting to distill biological insights from the massive, multi-dimensional datasets that are common to modern genome research. For such analyses, speed and efficiency are crucial given the size and sheer number of datasets involved. To solve this problem, we present a novel "slice-then-sweep" algorithm that, given N interval sets, efficiently reveals the subset of intervals that are common to all N sets. We demonstrate that our algorithm is more efficient in the sequential case and has a vastly higher capacity for parallelization with a 19x speedup over the existing algorithm.

Entities:  

Keywords:  Genomic interval intersection; bioinformatics; computational biology; genome analysis; parallel algorithm

Year:  2017        PMID: 30333632      PMCID: PMC6188649          DOI: 10.1109/JPROC.2015.2461494

Source DB:  PubMed          Journal:  Proc IEEE Inst Electr Electron Eng        ISSN: 0018-9219            Impact factor:   10.961


  15 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS).

Authors:  Gregory E Crawford; Ingeborg E Holt; James Whittle; Bryn D Webb; Denise Tai; Sean Davis; Elliott H Margulies; YiDong Chen; John A Bernat; David Ginsburg; Daixing Zhou; Shujun Luo; Thomas J Vasicek; Mark J Daly; Tyra G Wolfsberg; Francis S Collins
Journal:  Genome Res       Date:  2005-12-12       Impact factor: 9.043

3.  Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.

Authors:  Alexander V Alekseyenko; Christopher J Lee
Journal:  Bioinformatics       Date:  2007-01-18       Impact factor: 6.937

4.  fjoin: simple and efficient computation of feature overlaps.

Authors:  Joel E Richardson
Journal:  J Comput Biol       Date:  2006-10       Impact factor: 1.479

5.  Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing.

Authors:  Ryan Morin; Matthew Bainbridge; Anthony Fejes; Martin Hirst; Martin Krzywinski; Trevor Pugh; Helen McDonald; Richard Varhol; Steven Jones; Marco Marra
Journal:  Biotechniques       Date:  2008-07       Impact factor: 1.993

6.  Binary Interval Search: a scalable algorithm for counting interval intersections.

Authors:  Ryan M Layer; Kevin Skadron; Gabriel Robins; Ira M Hall; Aaron R Quinlan
Journal:  Bioinformatics       Date:  2012-11-04       Impact factor: 6.937

7.  The NIH Roadmap Epigenomics Mapping Consortium.

Authors:  Bradley E Bernstein; John A Stamatoyannopoulos; Joseph F Costello; Bing Ren; Aleksandar Milosavljevic; Alexander Meissner; Manolis Kellis; Marco A Marra; Arthur L Beaudet; Joseph R Ecker; Peggy J Farnham; Martin Hirst; Eric S Lander; Tarjei S Mikkelsen; James A Thomson
Journal:  Nat Biotechnol       Date:  2010-10       Impact factor: 54.908

8.  Genome-wide mapping of in vivo protein-DNA interactions.

Authors:  David S Johnson; Ali Mortazavi; Richard M Myers; Barbara Wold
Journal:  Science       Date:  2007-05-31       Impact factor: 47.728

9.  A user's guide to the encyclopedia of DNA elements (ENCODE).

Authors: 
Journal:  PLoS Biol       Date:  2011-04-19       Impact factor: 8.029

10.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

View more
  2 in total

1.  JOA: Joint Overlap Analysis of multiple genomic interval sets.

Authors:  Burçak Otlu; Tolga Can
Journal:  BMC Bioinformatics       Date:  2019-03-08       Impact factor: 3.169

2.  Vcfanno: fast, flexible annotation of genetic variants.

Authors:  Brent S Pedersen; Ryan M Layer; Aaron R Quinlan
Journal:  Genome Biol       Date:  2016-06-01       Impact factor: 13.583

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.