| Literature DB >> 28388874 |
William W Greenwald1, He Li2,3, Erin N Smith3, Paola Benaglio3, Naoki Nariai2,3, Kelly A Frazer4,5.
Abstract
BACKGROUND: Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from "genomic interaction" studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed.Entities:
Keywords: Bedtools; Chromatin conformation capture; Genomic arithmetic; Hi-CChIA-PET; Paired-genomic-loci; Peak; Tool suite
Mesh:
Substances:
Year: 2017 PMID: 28388874 PMCID: PMC5384132 DOI: 10.1186/s12859-017-1621-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Pgltools Implementation (a) An example of sorted, single locus bed file entries from a file sorted by start position. As entry 1 overlaps entry 3, entry 2 must also overlap entry 3. (b) A pictorial representation of PGL entries in a sorted PGL file where non-sequential PGL entries overlap. Loci are shown as blocks, with dashed lines connecting the paired-loci comprising a single entry. Both loci A and B in PGL entries 1 and 3 overlap, and both loci in PGL entries 2 and 4 overlap. (c) A flowchart of the overlap function shared between many operations in pgltools. File 2 has N-1 entries. File 2 is iterated by the File2-index i. File2[i] is a PGL entry for any 0 ≤ i < N. Throughout the algorithm, PGL entries from File 2 must be checked multiple times. Therefore, to reduce the number of comparisons performed by pgltools, the Recheck Index is used to store the index at which the previous overlap iteration began. When the ends of both files are reached, the algorithm ends
Summary of operations provided in pgltools
| Method | Description |
|---|---|
| intersect | Find overlapping paired-genomic-loci from two PGL files |
| merge | Merge nearby paired-genomic-loci within a single file and produce a column containing summary statistics requested through passed parameters (-c and -o) |
| subtract | Find parts of paired-genomic-loci from a PGL file that do not overlap another PGL file |
| window | Filter a PGL file to a particular genomic region |
| samTopgl | Converts a sam file to a PGL file |
| coverage | Find the coverage of a PGL file on another PGL file; usually used to find the coverage of reads from a PGL file derived from a sam file on a set of PGLs. The paired-genomic-loci from file 2 only need to overlap the paired-genomic-loci from file 1. |
| closest | Find the closest paired-genomic-loci from a PGL file for each paired-genomic-loci in another PGL file |
| expand | Expand both loci by a given size |
| intersect1D | Find the paired-genomic-loci that overlap regions from a bed file |
| closest1D | Find the closest paired-genomic-loci to a set of regions from a bed file |
| subtract1D | Find the parts of paired-genomic-loci that do not overlap regions from a bed file |
| sort | Sorts a PGL file for use with other PGLtools operations |
| formatbedpe | Convert a bedpe-like file to a PGL file |
| formatTripSparse | Convert a triplet sparse matrix file set to a PGL file |
| conveRt | Formats the PGL file for use with the GenomicInteractions R package |
| browser | Format a PGL file to be viewed in the UCSC Genome Browser |
| juicebox | Format a PGL file to be viewed in juicebox |
| condense | Convert a PGL file to a BED file with two entries for each PGL entry. |
| findLoops | Convert a PGL file to a BED file with an entry containing the region from the start of anchor A to the stop of anchor B for intra-chromosomal PGLs, and an entry for each anchor for inter-chromosomal PGLs. |
Fig. 2The operations of pgltools. PGL entries from file one are shown in various shades of blue, PGL entries from file two are shown in orange, and windows are shown in yellow (see legend at bottom right). All resulting outputs are shown below dashed lines, with novel entries shown in green and original entries shown in their original color. (a) The intersect operation finds overlapping paired-genomic-loci between two PGL files and returns the overlapping regions. (b) The merge operation combines overlapping paired-genomic-loci within a single PGL file. (c) The subtract operation returns the PGL entries from file one with the PGL entries from file two removed. (d) The window operation returns the PGL entries that fall completely within a specified genomic region. (e) The coverage operation returns the number of PGL entries from file two that overlap each PGL entry in file one. (f) The closest operation returns the closest PGL entry from file two for each PGL entry in file one. (g) The intersect1D operation returns PGL entries from file one that overlap regions in a bed file. (h) The closest1D operation returns the closest region from a bed file for each PGL entry in file one. (i) The subtract1D operation returns the PGL entries from file one with the regions from a bed file removed