| Literature DB >> 27999613 |
Syed Haider1, Daryl Waggott1, Emilie Lalonde1,2, Clement Fung1, Fei-Fei Liu2,3, Paul C Boutros1,2.
Abstract
BACKGROUND: Next-generation sequencing is making it critical to robustly and rapidly handle genomic ranges within standard pipelines. Standard use-cases include annotating sequence ranges with gene or other genomic annotation, merging multiple experiments together and subsequently quantifying and visualizing the overlap. The most widely-used tools for these tasks work at the command-line (e.g. BEDTools) and the small number of available R packages are either slow or have distinct semantics and features from command-line interfaces.Entities:
Keywords: BED format; Data integration; Genomic intervals; Sequence algebra
Year: 2016 PMID: 27999613 PMCID: PMC5157088 DOI: 10.1186/s13029-016-0059-5
Source DB: PubMed Journal: Source Code Biol Med ISSN: 1751-0473
Fig. 1Overview of bedr package. bedr can run on a commodity linux based computer or a cloud/cluster. Users can interface with the underlying driver engines such as BEDTools/BEDOPS/tabix/GenomicRanges through bedr methods in R. This enables integration of user-specified multiple genomic intervals with reference data sources such as gene annotations (e.g. UCSC) and disease specific features (e.g. COSMIC). Such integration spans general-purpose genomic interval operations of intersection (*), union (sum) and joins. Output is returned in R friendly data structures for convenience in subsequent downstream analyses. These data structures are readily convertible to standard data exchange formats such as BED and GRanges using bedr utility methods
Fig. 2Illustration of key bedr operations. bedr regions objects represent a collection of sub-regions specified as R vector or data.frame. Three partially overlapping example regions (a, b and c) located at the beginning of human chromosome 1 (red mark on ideogram, 1-250 bp) are shown here. Vertical gray separators between sub-regions indicate regions that are 1 base pair apart. Overlapping regions can be merged, joined, subtracted resulting in new regions objects as shown here. Associated source code snippets are documented in the Results section. Regions object flank (b, 5 bp) exemplifies bedr utility flank.regions creating flanking (up and/or downstream) regions of a specified length; +/-5 bp in the example shown here
| index | V4 | V5 | V6 | |
|
1
|
chr1:10-100
|
.
|
-1
|
-1
|
| index n.overlaps names a b c | |||
|
1
|
chr1:1-10
|
2
|
b,c 0 1 1
|
| CHROM | POS | ID | REF | ALT | QUAL | FILTER | |
|
1
|
1
|
69345
|
COSM911918
|
C
|
A
|
NA
|
<NA>
|
| INFO | |
|
1
|
GENE=OR4F5;STRAND=+;CDS=c.255C>A;AA=p.I85I;CNT=1
|
|
GRanges object with 6 ranges
| |||
| seqnames | ranges | strand | |
| <Rle> | <IRanges> | <Rle> | |
|
[1]
|
chr1
|
[10, 100]
|
*
|
|
- - - - - - -
| |||