| Literature DB >> 26938081 |
Ramya Raviram1,2, Pedro P Rocha1, Christian L Müller2,3,4, Emily R Miraldi2,3,4, Sana Badri1, Yi Fu1,2, Emily Swanzey5, Charlotte Proudhon1, Valentina Snetkova1, Richard Bonneau2,3,4, Jane A Skok1.
Abstract
4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26938081 PMCID: PMC4777514 DOI: 10.1371/journal.pcbi.1004780
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 24C-ker outperforms other methods in the region near the bait when the similarity of interacting regions between replicates is examined for four methods.
(A-B) Example datasets for 6bp and 4bp cutter experiments. Raw 4C-Seq reads are shown for a 10MB region around the bait in (A) and 2 MB in (B). Experiment in A was performed using activated B cells digested with HindIII and a bait near the Cd83 locus. Experiment in B was performed in double negative T cells digested with NlaIII and a bait near the Eβ enhancer of Tcrb. Significant interactions determined by each method for 2 replicates are shown below the raw 4C-Seq profile. Domainograms generated using 4cseqpipe are displayed for the same region. (C-D) Each dot represents the distance of the midpoint of each interacting domain to the bait plotted against its size. Plots only contain domains that overlap by 50% between replicates.
Summary of method comparison.
| Near-bait | Far | Differential analysis | ||
|---|---|---|---|---|
| Good | Good | Fair | Yes | |
| Majority of region called | Fair | Poor | No | |
| NA | Fair | Fair | No | |
| Restricted to the bait | Poor | Poor | Yes | |
| NA | NA | NA | Yes | |
| Visualization | NA | NA | No |