| Literature DB >> 27303634 |
Aaron T L Lun1, Malcolm Perry2, Elizabeth Ing-Simmons2.
Abstract
The study of genomic interactions has been greatly facilitated by techniques such as chromatin conformation capture with high-throughput sequencing (Hi-C). These genome-wide experiments generate large amounts of data that require careful analysis to obtain useful biological conclusions. However, development of the appropriate software tools is hindered by the lack of basic infrastructure to represent and manipulate genomic interaction data. Here, we present the InteractionSet package that provides classes to represent genomic interactions and store their associated experimental data, along with the methods required for low-level manipulation and processing of those classes. The InteractionSet package exploits existing infrastructure in the open-source Bioconductor project, while in turn being used by Bioconductor packages designed for higher-level analyses. For new packages, use of the functionality in InteractionSet will simplify development, allow access to more features and improve interoperability between packages.Entities:
Keywords: ChIA-PET; Hi-C; data representation; genomic interactions; infrastructure
Year: 2016 PMID: 27303634 PMCID: PMC4890298 DOI: 10.12688/f1000research.8759.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Overview of the classes in the InteractionSet package.
Relevant slots of each class (i.e., data values stored in each object of the class) are labelled with a preceding “@”. ( A) The GInteractions class represents pairwise interactions between genomic regions by storing pairs of anchor indices that refer to coordinates in a GenomicRanges object. ( B) The InteractionSet class stores experimental data in an “assays” matrix where each row is an interaction and each column is a sample. Here, counts represent the number of read pairs mapped between each pair of interacting regions in each sample. ( C) The ContactMatrix class represents the interaction space as a matrix, where each cell represents an interaction between the corresponding row/column regions.
Figure 2. Schematic of several methods in the InteractionSet package.
( A) Minimum bounding boxes can be identified for groups of interactions using the boundingBox method. Here, u′, v′ and w′ belong in one group while x′, y′ and z′ belong in another. ( B) One- or two-dimensional overlaps can be identified between interactions and one or two genomic intervals, respectively, using the findOverlaps method. Here, x′ and y′ have one-dimensional overlaps with the gene and enhancer, respectively, while z′ has a two-dimensional overlap with the gene and the enhancer. ( C) An InteractionSet object contains data – in this case, read pair count data – for interactions in the two-dimensional interaction space. Given a bait region, a “cross-section” of the space can be extracted and converted into a RangedSummarizedExperiment object using the linearize method. This object holds count data for intervals on the linear genome (blue lines) where the count for each interval describes the strength of the interaction between that interval and the bait. This format effectively mimics that of 4C data.