| Literature DB >> 30333632 |
Ryan M Layer1, Aaron R Quinlan2.
Abstract
The comparison of sets of genome intervals (e.g., genes, repeats, ChIP-seq peaks) is essential to genome research, especially as modern sequencing technologies enable ever larger and more complex experiments. Relationships between genomic features are commonly identified by their intersection: that is, if feature sets contain overlapping intervals then it is inferred that they share a common biological function or origin. Using this technique, researchers identify genomic regions that are common among multiple (or unique to individual) datasets. While there have been recent advances in algorithms for pairwise intersections between two sets of genomic intervals, few advances have been made to the intersection of many sets of genomic intervals. Identifying intersections among many interval sets is particularly important when attempting to distill biological insights from the massive, multi-dimensional datasets that are common to modern genome research. For such analyses, speed and efficiency are crucial given the size and sheer number of datasets involved. To solve this problem, we present a novel "slice-then-sweep" algorithm that, given N interval sets, efficiently reveals the subset of intervals that are common to all N sets. We demonstrate that our algorithm is more efficient in the sequential case and has a vastly higher capacity for parallelization with a 19x speedup over the existing algorithm.Entities:
Keywords: Genomic interval intersection; bioinformatics; computational biology; genome analysis; parallel algorithm
Year: 2017 PMID: 30333632 PMCID: PMC6188649 DOI: 10.1109/JPROC.2015.2461494
Source DB: PubMed Journal: Proc IEEE Inst Electr Electron Eng ISSN: 0018-9219 Impact factor: 10.961