| Literature DB >> 27538250 |
Charles Joly Beauparlant1,2, Fabien C Lamaze1,3, Astrid Deschênes1, Rawane Samb1, Audrey Lemaçon1, Pascal Belleau1, Steve Bilodeau1,3,4, Arnaud Droit1,2.
Abstract
ChIP-Sequencing (ChIP-Seq) provides a vast amount of information regarding the localization of proteins across the genome. The aggregation of ChIP-Seq enrichment signal in a metagene plot is an approach commonly used to summarize data complexity and to obtain a high level visual representation of the general occupancy pattern of a protein. Here we present the R package metagene, the graphical interface Imetagene and the companion package similaRpeak. Together, they provide a framework to integrate, summarize and compare the ChIP-Seq enrichment signal from complex experimental designs. Those packages identify and quantify similarities or dissimilarities in patterns between large numbers of ChIP-Seq profiles. We used metagene to investigate the differential occupancy of regulatory factors at noncoding regulatory regions (promoters and enhancers) in relation to transcriptional activity in GM12878 B-lymphocytes. The relationships between occupancy patterns and transcriptional activity suggest two different mechanisms of action for transcriptional control: i) a "gradient effect" where the regulatory factor occupancy levels follow transcription and ii) a "threshold effect" where the regulatory factor occupancy levels max out prior to reaching maximal transcription. metagene, Imetagene and similaRpeak are implemented in R under the Artistic license 2.0 and are available on Bioconductor.Entities:
Mesh:
Year: 2016 PMID: 27538250 PMCID: PMC4990179 DOI: 10.1371/journal.pcbi.1004751
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1metagene workflow.
A metagene analysis requires 3 types of inputs: 1) a list of genomic regions (BED or GRanges formats), 2) alignment files (BAM format) and 3) a design sheet (data frame format) explaining the relations between samples. The alignment files are processed to extract the coverages of every genomic regions. Afterward, the background is removed from the coverages and the signal is normalized (reads per millions aligned or RPM) to allow comparison between samples. The main output is the metagene plot. The other outputs are the curve values and confidence intervals (CI) used to produce the plot and an interactive heatmap with Imetagene. The results are compatible with similaRpeak for profile characterization.
Fig 2Impact of noise removal and description of the pseudometrics.
Metagene plots of the BCL11A transcription factor (A) with noise removal using the NCIS algorithm and (B) without noise removal. The x-axis is centered on enhancers and promoters ±1000bp. The y-axis represents the mean occupancy normalized in reads per million (RPM). Each line represents the mean occupancy of the BCL11A replicates. Groups of transcriptional activity of enhancers or promoters are identified by different colors (red = no CAGE signal; green = low CAGE signal; blue = moderate CAGE signal; purple = high CAGE signal; see S1 Text). Ribbons represent the 95% confidence interval of the mean calculated using 1000 bootstraps. (C) Description of some of the pseudometrics implemented in the similaRpeak packages.
Fig 3Metagene profiles in enhancer and promoter regions.
(A) POLR2A, the largest subunit of Pol II. (B) TAF1, a general transcription factor. (C) ELF1, a transcription factor. The x-axis is centered on enhancers and promoters ±1000bp. The y-axis represents the mean occupancy normalized in reads per million (RPM). Each line represents the mean occupancy of the factor replicates. Groups of transcriptional activity of enhancers or promoters are identified by different colors (red = no CAGE signal; green = low CAGE signal; blue = moderate CAGE signal; purple = high CAGE signal). The ribbons represent the 95% confidence interval of the mean calculated using 1000 bootstraps.