| Literature DB >> 23125363 |
Kazumitsu Maehara1, Jun Odawara, Akihito Harada, Tomohiko Yoshimi, Koji Nagao, Chikashi Obuse, Koichi Akashi, Taro Tachibana, Toshio Sakata, Yasuyuki Ohkawa.
Abstract
Deep sequencing approaches, such as chromatin immunoprecipitation by sequencing (ChIP-seq), have been successful in detecting transcription factor-binding sites and histone modification in the whole genome. An approach for comparing two different ChIP-seq data would be beneficial for predicting unknown functions of a factor. We propose a model to represent co-localization of two different ChIP-seq data. We showed that a meaningful overlapping signal and a meaningless background signal can be separated by this model. We applied this model to compare ChIP-seq data of RNA polymerase II C-terminal domain (CTD) serine 2 phosphorylation with a large amount of peak-called data, including ChIP-seq and other deep sequencing data in the Encyclopedia of DNA Elements (ENCODE) project, and then extracted factors that were related to RNA polymerase II CTD serine 2 in HeLa cells. We further analyzed RNA polymerase II CTD serine 7 phosphorylation, of which their function is still unclear in HeLa cells. Our results were characterized by the similarity of localization for transcription factor/histone modification in the ENCODE data set, and this suggests that our model is appropriate for understanding ChIP-seq data for factors where their function is unknown.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23125363 PMCID: PMC3592427 DOI: 10.1093/nar/gks1010
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic drawings of the co-localization model. (A) The concept of co-localization model (1) ChIP-seq data: called peaks of different ChIP-seq vary in their location (2) ‘distributions’: distribution of the target ChIP-seq peaks around the center of all peaks in other ChIP-seqs (3) model: ‘distribution’ is a mixture of exponential signals and uniform background. and represent a mixture ratio. p represents the steepness of the shape. (4) Profiling: the model represents the shape from ‘cooperative’ to ‘exclusive’. (B) Workflows: the model fitting examples of ‘cooperative’ (upper line) and ‘exclusive’ (lower line) types of accumulation are illustrated. Peaks were aggregated at reference peaks and translated into distance from a reference factor (X-axis) and the probability of peak detection (Y-axis) representation. Each mixture ratio of valid signal and background was then estimated as their shape parameter in the co-localization model. Some examples of parameter p in geometric distribution are also shown in the bottom-left panel. p affects the steepness of the slope.
Figure 2.Evaluation of co-localization model using S2ph with ENCODE data sets. Graphs show paired analysis by our software. The used antibody/cell of ENCODE data is shown on the top of the graph (upper left). Detailed information on the ENCODE data is described under the title. Localization of the ENCODE peaks is shown. The displayed region is from 5 kb upstream to 5 kb downstream for all genes. The y-axis shows the total count of each peak’s center divided by the number of genes for all genes in humans. Peaks that were outside of the region were noted as the percentage of the inter-gene region (upper right). The x-axis shows the distance from the center of the ENCODE peaks. Both sides of the positive and negative distances were plotted. The y-axis shows the total depth of S2ph peaks at each distance. A dotted gray line fitted to the distribution of the S2ph peaks with our model was also plotted. The shape is relatively symmetric at X = 0 because the orientation of transcription was ignored here, unlike localization of the ENCODE peaks plot. The percentage of S2ph peaks in the region, which is within 5 kb from the center ENCODE peaks, is noted in the left-top legend (bottom). The rightmost bar consists of black and white rectangles in each panel and is a type of Venn diagram. The vertical length from the bottom of the white rectangle to the top of the black rectangle indicates the number of ENCODE peaks. The length from the top of the white rectangle to the bottom of the black rectangle indicates the number of S2ph peaks. The length of the black rectangle indicates the number of overlaps between peaks of each set of compared data (right). (A) A transcriptional factor with a high score against S2ph; (B) a histone modification with a low score and high concentration against S2ph; (C) a histone modification with a negative score (exclusively related) against S2ph; (D) the ENCODE profile for S2ph data paired with 90 types of HeLa data sets. The points in the parameter space, spanned by the co-localization score and the concentration parameter, were plotted. The x-axis shows the co-localization score. The y-axis shows the value of concentration parameter. (E) A scatter plot for co-localization scores of S2ph at different threshold parameters of MACS. The x-axis shows the scores at P-value = 1e−2 and the y-axis shows the scores at P-value = 1e−5. The line is derived from linear regression analysis. The R2 and estimated coefficient of the regression are also displayed.
Figure 3.Factors co-localized with the phosphorylation state of Pol II CTD S2/S5/S7. (A) Scatter plot of co-localization scores of Pol II CTD S2/S5/S7 phosphorylation pairs. The diagonal panel shows the name of the data. The names correspond to each data pair of the off-diagonal panel. The lower triangular panels show scatter plots of co-localization scores and regression lines of each phosphorylation pair. Their X- and Y-axes are co-localization scores of the corresponding data labeled by diagonal panel. The upper triangular panel shows Pearson’s correlation coefficients of each phosphorylation pair. (B) Three factors variation plot. This plot shows the surface spanned by the differences between three variables: three scores of one factor paired with S2ph, S5ph and S7ph. Each arrow indicates the direction of differences between scale-corrected co-localization scores. means the direction of a difference between a co-localization factor paired with S2ph and with S5ph that is >0 (). The other axes were labeled in a similar manner. A factor located between the axes and is regarded as an S7ph-specific cooperative factor. The factors that were far from the center were labeled with the names of antibodies. The center was unaffected by any of the phosphorylation changes.