Literature DB >> 28204599

CisMapper: predicting regulatory interactions from transcription factor ChIP-seq data.

Timothy O'Connor, Mikael Bodén¹, Timothy L Bailey².

Abstract

Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Evidence from chromatin conformation capture experiments shows that this approach is inadequate due to long-distance regulation via chromatin looping. We present CisMapper, which predicts the regulatory targets of a TF using the correlation between a histone mark at the TF's bound sites and the expression of each gene across a panel of tissues. Using both chromatin conformation capture and differential expression data, we show that CisMapper is more accurate at predicting the target genes of a TF than the distance-based approaches currently used, and is particularly advantageous for predicting the long-range regulatory interactions typical of tissue-specific gene expression. CisMapper also predicts which TF binding sites regulate a given gene more accurately than using genomic distance. Unlike distance-based methods, CisMapper can predict which transcription start site of a gene is regulated by a particular binding site of the TF.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Transcription Factors

Year: 2017 PMID： 28204599 PMCID： PMC5389714 DOI： 10.1093/nar/gkw956

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Transcription factors regulate gene transcription by binding to specific regions of DNA called regulatory elements. This binding then activates or inhibits the action of transcriptional machinery at the transcription start site (TSS) of each gene it regulates. Particular TF binding sites are often unique to a specific cell type, condition, developmental stage or tissue (for brevity hereinafter referred to as a ‘tissue’), and defective binding due to mutations in the bound region (e.g. ‘regulatory SNPs’ (1)) or in the TF itself (2) can cause dysregulation of genes and pathological phenotypes. Thus, two key questions are (i) which genes does a given TF regulate in a particular tissue, and, for a given gene, (ii) which binding sites of the TF affect its expression? The current preferred method for determining the regulatory actions of a TF begins with predicting where it binds the genome in a given tissue using a chromatin immunoprecipitation followed by sequencing (ChIP-seq) assay (3). The next step usually assumes that each such predicted TF binding site (TFBS) regulates the closest gene, or that each gene is regulated by the closest TFBS, where distance is measured in bases (b) along the chromosome between a TSS of the gene and the TFBS. This ‘nearest neighbor’ assumption works fairly well in practice for predicting the gene targets of a TF, since many TFs regulate by binding in the promoter of the target gene. However, a good deal of regulation is via distal enhancer regions and involves chromatin looping (4,5), which causes these distance-based methods to make incorrect predictions. In one human cell line (GM12878), fully 41% of chromatin loops connecting a non-promoter region to a promoter skip one or more intervening promoters (6), violating the ‘closest gene’ assumption. Similarly, if the target gene has multiple TSSs, distance-based methods cannot tell which TSS is the actual target of a TF bound at a nearby enhancer. Finally, if a TF binds at multiple locations near a gene, there is no guarantee that the closest site actually regulates the gene, as the ‘closest TFBS’ method assumes. A number of methods for linking regulatory elements (such as enhancers) to target genes have previously been proposed that are not based on distance alone, but none have been tested with TFBSs predicted by TF ChIP-seq. The method of Ernst et al. (7) uses distance plus data for three histone modifications (H3K4me1, H3K4me2 and H3K27ac) and gene expression in a panel of tissues. It requires a supervised learning training step, and was not tested with regulatory elements predicted in a tissue not included in the panel. Similarly, Thurman et al. (8) showed that cross-tissue correlation of DNaseI hypersensitivity (DHS) between DHS regions overlapping promoters and DHS regions not overlapping promoters can predict regulatory relationships, but it is not clear how to extend their approach to linking TFBSs to promoters. DHS data are also available in far fewer organisms than histone modification data, restricting the applicability of that approach. The PreSTIGE algorithm (9) uses cross-tissue correlation of H3K4me1 and expression, but it was designed for linking enhancers (not TFBSs) to genes, requires CTCF binding data and only predicts links when both the H3K4me1 and expression signals are specifically enriched in a given tissue. He et al. (10) and Roy et al. (11) also proposed methods for training predictors of regulatory links between regulatory elements and genes using a large number of input features (e.g. histone modifications, DHS and TF ChIP-seq). These predictors are more accurate than the simple correlation-based approaches like PreSTIGE, but require data from many assays in order to make predictions in a tissue of interest. We previously described a method for predicting links between enhancers and genes using cross-tissue correlation between histone modifications and gene expression (12), and in the current work we extend and validate that approach for TFBS-gene links. Our primary goal is to provide a method for analyzing peaks from TF ChIP-seq experiments that is as easy to use as distance-based methods, but is substantially more accurate. We propose a method we call CisMapper that, like distance-based methods, only requires the user to provide the genomic locations of predicted TFBSs. Rather than using distance, CisMapper infers regulatory links from the correlation between the presence of a selected histone modification (typically H3K27ac) at the TFBS and the expression of a gene across a panel of tissues in the same organism. We make available for free download the CisMapper software (suitable for OS X, Linux or Unix) and panels of histone and expression data for human (13) and mouse (14) from ENCODE, and for human from the Roadmap Epigenomics Project (15). We show that CisMapper is substantially more accurate than distance-based methods for predicting regulatory links between a TF's binding sites and specific TSSs, that the target tissue need not be present in the tissue panel, and that the target TF need not be expressed in all the panel tissues. We also show that accuracy increases with the number of tissues in the panel, and that CisMapper predictions can improve gene enrichment analyses.

MATERIALS AND METHODS

The CisMapper algorithm

Given a set of ChIP-seq peaks for a TF in some tissue along with auxiliary information in the form of expression and histone modification data for each of a panel of tissues in the same organism, CisMapper computes a score for a (peak, TSS) link using the correlation of expression at the TSS and the presence of the histone modification at the peak across the panel of tissues (Figure 1). Specifically, the score of a (peak, TSS) link is the p-value of the Pearson correlation coefficient between the log of the histone modification signal at peak and the log of the expression at the TSS. (Details are given in the Supplementary Data). We also tested using the Spearman rank correlation coefficient, but found it to give worse results (data not shown).

Figure 1.

Schematic of the CisMapper method. CisMapper predicts regulatory links in tissue X between TF ChIP-seq peaks (red) and TSSs of genes by measuring the correlation of histone levels (shown as colored tracks) that overlap peaks (highlighted in red) with expression levels across a set (‘panel’) of tissues. Tissue X need not be present in the panel. *Distance limit is user configurable with 500 Kb chosen for this work. Here, we study using the active enhancer mark H3K27ac (16), the poised enhancer marks H3K27me3 and Zentner2011 (17), and the active promoter mark H3K4me3 (18), but in principle any histone mark could be used with CisMapper. (Note that CisMapper only uses data for a single histone mark at a time.) Using the P-value of the correlation as the score normalizes for panel size, allowing us to compare the effect of the score threshold across experiments with varying panel sizes. Although the correlation of a histone mark a ChIP-seq peak with expression at a TSS can be positive or negative, with positive correlation implying that the mark increases expression, since we are using histone marks indicative of active enhancers and promoters we restrict our analyses here to positive correlations. CisMapper generates four ranked lists of predictions from the set of scored (peak, TSS) links. Two ‘target’ lists rank TSSs and genes, respectively, as potential targets of the ChIP-ed TF. The target score for a TSS is the minimum (best) score of any of its links. The target score for a gene is the minimum (best) target score of any of its TSSs. Two ‘element’ lists rank TF ChIP-seq peaks as potential regulators of TSSs and genes, respectively. The regulatory element lists group all the links for a given TSS or gene together, and sort within each group in increasing order by link score. Details of list creation are given in the Supplementary Data. For practical reasons, it is necessary to restrict the set of possible (peak, TSS) links for which CisMapper computes link scores. First, in this work we restrict CisMapper to links where the TF ChIP-seq peak and the TSS are on the same chromosome and separated by at most 500 Kb. We do this to reduce the required compute time as well as to reduce the number of possible links with low (good) link scores merely due to chance. We note that previous studies that predicted enhancer-promoter links also chose to limit the maximum link length considered for similar reasons (e.g. 125 Kb in Ernst et al. (7), 500 Kb in Thurman et al. (8) and 2 Mb in He et al. (10)). Second, following related work by (19), CisMapper only computes scores for links where there is non-zero variation in the histone level at the peak and the variation in expression at the TSS meets certain criteria. (See Supplementary Data for details.) Subject to the above caveats, CisMapper computes link scores for all possible (peak, TSS) pairs, so each peak can be linked to multiple TSSs, and vice-versa.

Validating predictions using chromatin contacts

We look for direct evidence of physical contact between CisMapper high-scoring (peak, TSS) pairs from promoter capture Hi-C (CHiC) data. We use these data to study (i) the coverage and accuracy of CisMapper predictions, (ii) the necessity of the target (ChIP-ed) tissue in CisMapper's panel and (iii) whether the ChIP-ed TF needs to be expressed in the panel tissues. The chromatin contact data we use are for GM12878 cells (6), which was the highest resolution data available when this study was conducted. To measure accuracy, we use the positive predictive value (PPV), which is equal to one minus the false discovery rate (1−FDR), where a predicted link is confirmed if its two ends overlap the two ends of a promoter-other chromatin contact in the Mifsudet al. (6) data. The CisMapper panel consists of eight tissues—GM12878, Ag04450, H1-hESC, HeLa-S3, HepG2, HUVEK, K562 and NHEK—and the histone (H3K27ac) and expression data (CAGE) come from ENCODE (Supplementary Data lists data sources). TF ChIP-seq peaks are for the 19 TFs in Supplementary Data with ENCODE ChIP-seq data in GM12878 cells. Further details are given in Supplementary Data.

Validating predictions using differential TF activity

Sikora-Wohlfeld et al. (20) developed the ‘differential TF activity’ evaluation method and used it to evaluate a large number of distance-based predictors of regulatory interactions from TF ChIP-seq data. This evaluation method uses sets of TSSs that are differentially expressed in two tissues in which the ChIP-ed TF is active. They reasoned that if a TF is active in both tissues, some of the changes in gene expression between those tissues should be due to changes in activity of the TF. Hence, the top 500 differentially-expressed TSSs should be enriched for direct targets of the ChIP-ed TF. The figure of merit is the size of the overlap between the top 500 differentially-expressed TSSs and the top 500 predictions of predictor being evaluated, minus size of overlap expected if the predictor guessed randomly. Sikora-Wohlfeld et al. (20) found that the differential TF activity evaluation method gave results consistent with other evaluation methods that use TF perturbation data, functional homogeneity of target genes or consistency of target gene predictions across multiple ChIP-seq data sets, respectively. Note that although we use the evaluation method of Sikora-Wohlfeld et al. (20), we do not use their data or results. A diagram (Supplementary Data) and further details are given in Supplementary Data.

Validating predictions using gene enrichment analysis

We analyze the enrichment of genes predicted by CisMapper or GREAT (21) to be associated with TF ChIP-seq peaks for p300 in embryonic (E14.5) mouse neocortical tissue from Table S1 of Wenger et al. (22). For CisMapper we use Mouse ENCODE histone (H3K27ac) and expression (long polyA+) for a panel of 22 mouse tissues listed in Supplementary Data and a distance limit of 500 Kb. We use the target gene list produced by CisMapper with a link score threshold of 0.01. We then apply the DAVID (23,24) on-line gene enrichment tool to the gene targets predicted by CisMapper to determine enriched Gene Ontology (25) terms. For comparison, we perform enrichment analysis on the same TF peaks using GREAT with its default region-gene association rule. This associates each peak with every gene whose ‘genomic region’ it overlaps. GREAT defines the genomic region of a gene as a basal domain of −5 Kb to +1 Kb around its TSS, which it then extends that up to 1 Mb in either direction, stopping if it encounters another gene's basal domain.

RESULTS

CisMapper accurately predicts contacts between promoters and TF-bound regions

We first demonstrate that CisMapper can accurately predict the long-distance contacts between TF-bound regions and promoters to be expected when a distal TFBS regulates a gene. For validation we use CHiC chromatin contact data (see Materials and Methods), and observe that CisMapper predicted (peak, TSS) links are greatly enriched for chromatin contacts compared with links predicted by distance. Using a panel of eight tissues and TF ChIP-seq peaks for 19 TFs in GM12878 cells, the potential regulatory links predicted by CisMapper with link scores less than 0.01 are at least 73% more likely to be confirmed by CHiC chromatin contact data than all links of the same length (Figure 2). High-confidence CisMapper links (score < 10−5) shorter than 50 Kb have a median PPV of 0.57 across 19 TF ChIP-seq data sets, whereas all potential (peak, TSS) links shorter than 50 Kb have a median accuracy of only 0.21 (2.7-fold improvement).

Figure 2.

Distribution of the accuracy of peak-TSS links of different maximum lengths predicted by CisMapper for 19 TFs in GM12878 cells. The plot shows the distribution of the accuracy (PPV) of predicted links with lengths less than a given distance and CisMapper link scores less than or equal to 1, 0.1, 0.01, 0.001, 10−4, 10−5, 10−10 or 10−20 (blue to purple boxplots, from left to right for each distance). Links are validated using CHiC contact data. The red box plots (score ≤ 1) correspond to predicting that TF peaks regulate every TSS within the given distance from them. All CHiC and TF ChIP-seq data are from GM12878 cells. CisMapper links were scored using H3K27ac histone and CAGE expression data from a panel of eight tissues including GM12878, and only positive correlations are considered. The box plots summarize the results for 19 sets of TF ChIP-seq peaks; boxes show the range of the middle quartiles with a line at the median, and dots are outliers further than 1.5 times the interquartile range (the whiskers) from the median. As shown in Figure 2, the median accuracy of CisMapper-predicted links is higher than that of all similar length links for all tested score thresholds (from 0.1 to 10−20) and for all tested link lengths (50–500 Kb). The maximum improvement in accuracy is seen for short links (d < 50 Kb) and score thresholds below 0.001 (2.7-fold improvement in median PPV). Prediction accuracy increases with decreasing link length and increasing score stringency, with a maximum median PPV of 59% for links shorter than 50 Kb and a score threshold of 10−4 or lower. The higher accuracy of CisMapper predictions relative to distance-based predictions is consistent across the 19 TF ChIP-seq data sets analyzed here (Supplementary Data). The PPV of all CisMapper links predicted at a score threshold of <10−5 ranges from a high of 37% for RXRA to a low of 12% for ZBTB33. For all 19 of the TFs studied in this experiment, the PPV of CisMapper predictions is higher than that of links predicted using a distance threshold yielding a similar length distribution (350 Kb). CisMapper's approach is clearly superior to using distance alone for predicting specific regulatory interactions between a bound TFs and TSSs. What is more, predicted links can easily be thresholded on both link score and link length (as done in Figure 2) to select links with high probability (>50%) of corresponding to contacts between promoters and TF-bound regions (Supplementary Data). In this experiment, prediction accuracy for links predicted using a CisMapper score threshold of 0.01 drops below 10% (see Supplementary Data) for the subset of links with lengths in the range 450–500 Kb. While this level of accuracy is still nearly twice as high as using a distance threshold alone, the 500 Kb limit on link length we have chosen here may be a reasonable value in practice. Although the coverage (recall) of CisMapper is relatively low compared to using a simple distance threshold (Supplementary Data), we would argue that this is a reasonable trade-off in circumstances where a set of predicted regulatory links is desired for further examination. Higher PPV means lower FDR, so if predictions will be tested via expensive wet-lab experimentation, a smaller set of predicted links of higher precision may be preferable to a larger set of links that contains a higher proportion of false positives.

The target tissue need not be present in the panel

We wondered if CisMapper could successfully predict potential regulatory interactions using TF ChIP-seq data from a tissue not included in its panel. If true, this would greatly expand its utility. To examine this question we repeated the CHiC validation experiment after removing the target tissue (GM12878) from CisMapper's panel. As seen in Figure 3, using a score threshold of 10−5CisMapper's predictions are still substantially more accurate at all distance thresholds than distance alone. On the other hand, including the target tissue in the panel does increase accuracy, especially for links shorter than 50 Kb. It is clear, therefore, that CisMapper is useful for analyzing TF ChIP-seq peaks from tissue types not included in its tissue panel, but accuracy will be better if the panel includes the tissue in which the TF was ChIP-ed.

Figure 3.

Including the ChIP-ed tissue in CisMapper's panel improves accuracy. The plot shows the distribution of the accuracy (PPV, y-axis) of predicted links with lengths less than a given distance (x-axis) and CisMapper link scores less than or equal to 1 (red, distance-only) or 10−5 when we exclude (purple) or include (dark purple) the ChIP-ed tissue (GM12878) in the tissue panel. The data and methods are the same as in Figure 2.

The ChIP-ed TF need not be expressed in all panel tissues

We also wondered if the ChIP-ed TF needs to be expressed across CisMapper's tissue panel. Consequently we examined the relationship between the accuracy of predicted regulatory links and the level of expression of the TF across the panel for the CHiC validation experiments. As can be seen in Figure 4, there is no discernible relationship between accuracy (PPV) and the expression of the ChIP-ed TF across the panel. For example, the median expression of a single TF varies by four orders of magnitude (from 0.01 to 100 reads-per-million, Figure 4, blue), but this has no consistent effect on the accuracy of CisMapper's predictions. The TF for which CisMapper's predictions are most accurate is RXRA, which has the smallest median and third smallest maximum of expression across the panel of tissues used by CisMapper (data not shown). In fact, RXRA has no measurable expression (according to the ENCODE CAGE data used here) in two of the eight tissues, including in GM12878, the tissue in which it was ChIP-ed. Two other TFs have no measurable expression in five out of eight tissues (data not shown), yet they rank third (BCL11A, PPV = 0.33) and eighth (PU.1, PPV=0.27) in accuracy among the 19 TFs tested here (Supplementary Data).

Figure 4.

Accuracy does not depend strongly on expression of the ChIP-ed TF in the tissue panel. Each point shows the accuracy (PPV) and either the maximum (red) or median (blue) level of expression (reads-per-million, RPM) of a single TF across the panel of eight tissues used by CisMapper to predict regulatory links (score < 10−5) between ChIP-seq peaks for the TF and TSSs in GM12878 cells. The TFs are labeled and their points are connected with a gray line. The data are from the same experiments as in Figure 2. (The median expression of BCL11A is zero and is not plotted). Some TFs show highly tissue specific expression, so we wondered if CisMapper could predict regulatory links for them even if they were not expressed in any tissue included in its panel. We therefore repeated our validation using chromatin contacts after removing any tissue from the panel where the ChIP-ed TF showed measurable expression. In this new experiment, we selected five additional TFs (RUNX3, PAX5, IRF4, IKZF1 and BATF) with ENCODE ChIP-seq peaks in GM12878 because these TFs have measurable expression in GM12878 and at most two other panel tissues. When we exclude these tissues, each panel contains at least five of the original eight panel tissues (but the number and identities of the tissues varies depending on the TF). CisMapper's predictions are still more accurate at all distance thresholds than distance alone (Supplementary Data) for these five ‘tissue specific’ (with respect to the panel) TFs. The ability to make predictions for a TF not expressed in any tissue in the histone/expression panel is likely due to the fact that the TF binds in enhancer regions that are active (and varying) across the panel.

CisMapper is more accurate than distance-based methods

We next explore how CisMapper accuracy compares with distance-based approaches. A recent survey of distance-based methods for linking TF ChIP-seq peaks to genes studied six methods and found two—Linear and ClosestGene—to be consistently superior to the others they tested (20). The window-based Linear method simply adds a value between 0 and 1 to a gene's score for each peak within 10 Kb of the gene's TSS, where the value added decreases linearly with the peak-TSS distance. The ClosestGene method assigns each peak to the nearest gene, then scores the peak based on how well the distance fits the observed distribution of peak-TSS distances, and finally sums all the peak scores for each gene. We applied CisMapper, Linear and ClosestGene to ChIP-seq data for 27 TFs in a variety of tissues (Supplementary Data), and estimated the accuracy of the predictions using Sikora-Wohlfeld et al. (20)'s ‘differential TF activity’ evaluation method (see Materials and Methods). Overall, CisMapper predictions are substantially more accurate than those made by ClosestGene (Figure 5) or Linear (Supplementary Data). The median accuracy of the TSS target predictions made by CisMapper is higher than that of ClosestGene for 26 out of 27 TFs tested (P < 10−6, sign test), and higher than that of Linear for 25 of 27 TFs tested (P < 10−5, sign test). For 26 out of 27 TFs, CisMapper correctly identifies between 1.5 and 26.5 more TSS targets than ClosestGene, and correctly identifies 10 times more TSS targets on average (Supplementary Data). CisMapper is also more accurate than ClosestGene for predicting gene (rather than TSS) targets for 20 of 27 TFs (Supplementary Data, P < 0.01, sign test). Here the CisMapper panel of tissues draws from six of the eight following tissues: Ag04450, GM12878, H1-hESC, HeLa-S3, HepG2, HUVEC, K562 and NHEK; the CAGE expression and H3K27ac histone data is from the ENCODE sources listed in Supplementary Data (see Supplementary Data for details).

Figure 5.

Validation of CisMapper target TSS predictions using differential TF activity. The figure shows the accuracy (‘Adjusted Overlap’) of TSS target predictions from CisMapper (red) or ClosestGene (grey) for 27 different TFs (X-axis). Each prediction method is allowed to predict 500 TSS targets for a TF, and the adjusted overlap is the number of predicted TSS targets that are also among 500 most differentially expressed TSSs between the ChIP-ed tissue and another tissue, minus the expected size of the overlap by chance. The box-and-whisker plots summarize the results of between 2 and 30 experiments involving ChIP-seq peaks for the given TF. Outliers that are further than 1.5 times the interquartile range from the median are shown as black dots in the plots. All CisMapper maps are built using H3K27ac histone data and CAGE expression data, and the differentially expressed TSS sets are also based on CAGE data. To check the consistency of our two evaluation methods, we looked at how they ranked the accuracy of CisMapper predictions on the 19 TF ChIP-seq data sets that we evaluated using both methods. In both these evaluations, CisMapper based its predictions on an enhancer mark (H3K27Ac), so we divided the 19 TFs into two groups according to their preference for binding in enhancer regions, based on data from Ernst et al. (26). Supplementary Data shows that for six of the seven TFs that bind preferentially in enhancer regions, CisMapper predictions are ranked highly by both evaluation methods. The notable exception is that the two evaluation methods disagree strongly on the accuracy of the CisMapper predictions for the RXRA ChIP-seq data set. This anomaly may be due to poor quality of the RXRA ChIP-seq data set. There is no significant enrichment of any of the known motifs for RXRA from the JASPAR database (27) in the RXRA ChIP-seq peaks based on a CentriMo (28) motif enrichment analysis (data not shown). The high PPV of the links predicted by CisMapper in the RXRA data set according to the chromatin contact evaluation method suggests that those ChIP-seq peaks frequently contain regions in contact with neighboring genes. The low accuracy according to the differential TF activity evaluation is not surprising given the lack of evidence of actual RXRA binding in the peaks. Thus, with the exception of the RXRA data set, both evaluation methods estimate the accuracy of CisMapper predictions based on an enhancer mark to be generally highest for TFs binding primarily in enhancer regions, as would be expected.

CisMapper can use a variety of histone marks

Thus far we have only presented results based on using the active enhancer histone mark H3K27ac in CisMapper's tissue panel. When we repeat the TSS target prediction experiment above using histone data for the active promoter histone mark H3K4me3 in place of the H3K27ac data used above, CisMapper is more accurate than ClosestGene, although the comparative advantage is smaller than when using H3K27ac (Supplementary Data). For 21 of 27 TFs, the median accuracy of CisMapper predictions is higher than that of ClosestGene (P < 0.003, sign test), compared with 26 of 27 TFs when CisMapper uses H3K27ac data (Figure 5). We also examined using histone marks H3K27me3, associated with poised enhancers (29) and H3K36me3, associated with active enhancers and transcribed genes (17). We found that the accuracy of predicted links was somewhat lower using these two marks (data not shown). These results suggests that CisMapper can be used effectively with ChIP-seq data for histone marks other than H3K27ac should data for that mark not be available for enough tissues to build a panel (see next section).

Increasing panel size improves CisMapper coverage and accuracy

We assumed that CisMapper coverage and accuracy should increase with the size of the panel of tissues it uses for computing peak-TSS correlations. To test this we again used the differential TF activity method, but switched to data from the more extensive Roadmap Epigenomics Project (15) to allow us to create panels of from 5 to 30 tissues using histone ChIP-seq data for H3K4me3, and polyA+ RNA-seq expression data. Since RNA-seq data does not identify the TSS as accurately as CAGE data, we use the gene target list output by CisMapper rather than its TSS target list in this evaluation. (See Supplementary Data for details.) The accuracy of CisMapper target predictions increases with the panel size (Figure 6). The median of the adjusted overlap score almost triples over the range of panel sizes we tested (5–30). What is more, the coverage of CisMapper target predictions increases with the panel size (Supplementary Data), as might be expected due to the increased statistical power of larger panels. A similar increase in accuracy between tissue panels of size 5 and 30 is seen for each of the 19 individual TFs we tested (Supplementary Data). Although we observe a plateau in the accuracy of CisMapper gene target predictions when the panel size reaches 25 tissues (Figure 6), for most of the 19 TFs we tested, the number of gene targets predicted by CisMapper at a link score threshold of 0.001 more than doubles. This plateau is probably due to limitations in the available data reducing the diversity of any additional tissues added to the panel beyond 25. (See Supplementary Data for further discussion of this issue.)

Figure 6.

Accuracy increases with panel size. The plot summarizes the distribution of the median adjusted overlap score (y-axis) for the top 500 gene target predictions of 19 TFs as the number of tissues used by CisMapper (‘Panel Size’, x-axis) increases. The TFs used are those with three or more ChIP-seq peak sets in Supplementary Data. The expression data used for validation comes from CAGE expression in ENCODE tissues. CisMapper gene target predictions use histone (H3K4me3) and expression (long polyA+) data from subsets of the 38 tissues from the Roadmap Epigenomics Project. Smaller panels are always a subset of the next larger panel, and values in the figures are averages over 15 independent nested panel sets.

CisMapper scores are calibrated

Using data from the previous section, we checked that CisMapper scores are ‘calibrated’ in the sense that a given score corresponds to the same accuracy regardless of panel size. This is evidenced by the scatter plot in Figure 7, which shows the accuracy (y-axis) of gene target predictions using the link score threshold given on the x-axis. Each point represents the median CisMapper results for one TF ChIP-seq data set, averaged over the different tissue subset panels, as described above. The X-value of each point is the median of the link score of the 500th gene in the target list, and the Y-value is the median accuracy (adjusted overlap scores).

Figure 7.

Score calibration does not depend on panel size. Each point represents a single TF, and shows the median link score of the 500th ranked target (x-axis) versus the median accuracy of CisMapper of those 500 target gene predictions (y-axis), averaged across all ChIP-seq peak sets for the TF, with tissue panels of size 5 (grey) or 30 (blue). The data are from the same experiments as in Figure 6. Two things are clear from Figure 7. First, there is a very strong correlation between the CisMapper link score threshold and gene target prediction accuracy. Secondly, the slope of this correlation is essentially unchanged when CisMapper uses a panel of five tissues (grey points) or 30 tissues (blue points). This implies that the prediction accuracy when using a given link score threshold does not depend strongly on panel size. Therefore, a reasonable choice of link score threshold will remain so regardless of how many paired histone-expression data sets are provided as input to CisMapper. Thus, the main effect of increasing panel size is to increase the coverage (number of predictions) at a given link score, while maintaining the accuracy of those predictions.

CisMapper predictions can improve gene enrichment analyses

Perhaps the most common downstream analysis applied to TF target gene predictions is gene enrichment analysis, and we wondered if this type of analysis would benefit from the improved accuracy of CisMapper predictions. To address this, we compare gene enrichment analysis of gene targets predicted by CisMapper with a similar analysis using the distance-based enrichment analysis tool GREAT (21). The TF ChIP-seq peaks are for p300 in embryonic (E14.5) mouse neocortical tissue (22). Given the tissue and stage of neocortical development, we expect p300-bound regions to regulate many neural-development related functions. In this example, the gene enrichment analysis based on the CisMapper predicted targets appears more informative than analysis based on distance-based target prediction (see Supplementary Data). Although the GREAT tool identifies many neural-related biological processes and molecular functions enriched among its predicted 4676 gene targets (22), the 938 gene targets predicted by CisMapper are enriched for important neural-related processes and functions that are not identified by GREAT. For example, only the CisMapper-predicted targets are enriched for genes involved in the neural projection biological process (Supplementary Data), a critical process in neuron formation within the cortex (30). CisMapper also scores a key regulator of neural projection in neuron development, Fezf2 (31,32), as a top target. Furthermore, CisMapper predictions identify genes primarily enriched in ion transport and charge potentiation molecular functions (Supplementary Data), crucial to the excitatory function of pyramidal neurons in the neocortex (33). These are missing from the GREAT predictions, which mainly identify transcription-related functions. Finally, there are no enriched cellular component terms among the GREAT-predicted gene targets, whereas terms highly relevant to neocortical neurons such as ‘neural projection’, ‘plasma membrane’ (the location of ion channels), ‘axon’ and ‘synapse’ are enriched among the CisMapper-predicted gene targets (Supplementary Data).

DISCUSSION

Several previous studies have sought methods for accurately identifying the gene targets of regulatory regions (7–11,34) using auxiliary data on gene expression, TF binding, DNaseI hypersensitivity and histone modifications. Although demonstrably more accurate, these methods have not supplanted simple distance-based association of TF ChIP-seq peaks with putative target genes in practice. This is probably due mainly to the relative simplicity of distance-based methods, as well as to the fact that the more advanced methods have not been explicitly validated on regulatory regions defined by TF ChIP-seq peaks. We developed CisMapper to provide a method that is more accurate than simple distance-based methods, but that places a minimum burden on the user to provide auxiliary data. CisMapper uses only data for a single histone modification and gene expression across a small panel of tissues, requires no training step and has been extensively evaluated here as an alternative to distance-based methods for analyzing TF ChIP-seq peaks. CisMapper can analyze TF ChIP-seq peaks to predict regulatory links between TF binding sites and the TSSs of genes. It predicts these links using cross-tissue correlation between histone marks overlapping the TF binding site and expression at the TSS. The target lists output by CisMapper can be used to predict either which TSSs or which genes a given TF regulates. Similarly, the regulatory element lists it outputs can be used to predict which specific TF binding sites are most likely to regulate a given TSS or gene. We have shown that the regulatory links predicted by CisMapper coincide with chromatin contacts at a higher rate than links predicted based on the distance between the binding site and the TSS, the current method of choice. Direct chromatin contact between a bound TF and a TSS is highly suggestive of a possible regulatory interaction, which is what CisMapper is intended to predict. We also report experiments using the differential TF activity evaluation method to show that CisMapper's lists of the gene and TSS targets of a TF have higher accuracy than predictions made by distance-based methods. We have also shown that CisMapper is especially accurate for predicting long-distance regulatory links that are beyond the reach of distance-based prediction methods, and that as more histone and expression data become available across a larger number of tissues, the accuracy of CisMapper's regulatory predictions will improve. Based on these results, we believe that CisMapper is a valuable addition to the standard bioinformatic toolkit for analyzing TF ChIP-seq data. Importantly, we have shown that CisMapper requires neither histone nor expression data from the tissue of interest, only the genomic loci of the ChIP-seq peaks for a TF in that tissue. However, if such histone and expression data are available, it can and should be included in CisMapper's input, as we expect it to improve prediction accuracy. We have also shown that CisMapper does not require the TF to be expressed in any of the panel tissues to accurately predict regulatory links to its TFBS. This suggests that even if a TF's expression is tissue specific, CisMapper can still detect when it binds to enhancers showing varying activity across CisMapper's tissue panel. Suitable compendia of histone mark and expression data currently exist for using CisMapper to analyze TF ChIP-seq data from human, mouse, fly and worm. For analyzing human data, extensive histone and expression data are available from the Roadmap Epigenomics Project (15), from ENCODE (13) and from FANTOM5 ((35); expression data only). Data for mouse are available from the mouse ENCODE project (14), and a mouse blood-specific compendium has been published recently (36). The modENCODE project provides data for both fly (37) and worm (38). Each of these compendia contain matched histone and expression data from seven to over 100 tissues, and our results show that CisMapper can make useful regulatory predictions when provided with such data for as few as five tissues in the organism of interest. While we have shown that CisMapper predictions are more accurate than distance-based predictions, the coverage of CisMapper and distance-based methods is quite distinct. On the one hand, distance-based methods are confounded when chromatin looping causes a TF binding site to regulate a TSS other than the nearest one. On the other hand, CisMapper can only predict a regulatory link between a TF binding site and a TSS when there is variation in their histone mark and expression, respectively, across the tissues in the histone/expression compendia provided to CisMapper. Consequently, the regulatory predictions made by CisMapper are somewhat complementary to those made by distance-based methods. Due to the complementarity of the distance- and correlation-based approaches to regulatory interaction prediction, a future version of CisMapper will integrate genomic distance directly with histone-expression correlation in calculating the link score. We anticipate this will improve CisMapper's coverage. In the mean time, we recommend analyzing TF ChIP-seq peaks with both CisMapper and a distance-based method. The CisMapper predictions will provide a higher quality set of predicted targets and regulatory binding sites, and the union of those predictions with the distance-based predictions will provide a higher-coverage, albeit less-accurate, set. CisMapper predictions of regulatory links are also complementary to those inferred from chromatin conformation capture (CCC) data because they are based on completely different types of evidence. Specifically, the link score that CisMapper calculates for a pair of loci indicates how related histone and expression levels are between the loci, whereas, the read count for a pair of loci produced by a conformation capture assay, after conversion to a score that corrects for distance-dependent and other biases, can be used to infer if the two loci are in contact. Thus, CisMapper and chromatin conformation capture assays (e.g. 3C (39), 4C (40), 5C (41), Hi-C (42), ChIA-PET (43) or CHiC (6)) provide scores that are independent predictions of regulatory interactions between pairs of genomic loci. This independence suggests that intersecting the sets of loci pairs predicted by CisMapper with those predicted by CCC in the same tissue should yield an even more accurate set of predicted regulatory interactions. Analyses of the regulation of expression by a transcription factor should benefit from CisMapper's more accurate and highly specific predictions of regulatory links between its binding sites and particular TSSs. For example, when searching for regulatory SNPs, it is reasonable to assume that those contained in TF binding sites predicted by CisMapper to be regulatory are more likely to be important biologically. (Note that we assume that the binding sites can be identified within the TF-bound regions predicted by CisMapper via standard motif-based methods (44).) Likewise, gene ontology analysis (25) performed using the more accurately predicted target gene set provided by CisMapper should better elucidate the biological roles of the ChIP-ed TF. Finally, when validating predicted regulatory binding sites via genome editing (e.g. using CRISPR/Cas (25)), CisMapper's ability to associate specific binding sites with a gene and to rank them by regulatory potential should prove invaluable. The use of CisMapper need not be restricted to the analysis of TF ChIP-seq data. CisMapper can take as input any set of loci (expressed as a BED file) from the genome of interest, and will generate lists of the TSSs and genes that those regions may regulate. Previously we showed that the cross-tissue histone-expression correlation approach used by CisMapper can predict regulatory links between enhancers and TSSs (12), providing the first validation of this idea (19,45). As noted above, distance-based methods cannot reliably distinguish which TSS might be regulated by a given locus due to the possibility of chromatin looping. This ability to make TSS-specific predictions of regulation by arbitrary genomic loci is a novel feature of CisMapper. A second novel feature of CisMapper is that it can utilize data for any type of histone mark in making its predictions, and the regulatory links it predicts will depend on the histone mark chosen (e.g. H3K27ac or H3K4me3). By contrast, distance-based methods do not make predictions that take into account the histone state of the predicted regulatory loci. In future work we will explore running CisMapper using a series of distinct histone marks in order to classify links according to their ‘histone profiles’—the set of histone marks that identify the given link. This may allow us to group regulatory links into biologically relevant classes (e.g. activating, repressing, promoter-specific, enhancer-specific, etc.) in a way analogous to previous work that uses histone profiles to assign genomic loci to classes such as promoter, enhancer, insulator, etc. (46,47). In principle, this link-profiling approach might be used to classify links predicted by CisMapper from TF binding sites (ChIP-seq peaks), enhancers, disease-associated SNPs or chromatin conformation contact data. Click here for additional data file.

47 in total

1. Zfp312 is required for subcortical axonal projections and dendritic morphology of deep-layer pyramidal neurons of the cerebral cortex.

Authors: Jie-Guang Chen; Mladen-Roko Rasin; Kenneth Y Kwan; Nenad Sestan
Journal: Proc Natl Acad Sci U S A Date: 2005-11-28 Impact factor: 11.205

2. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements.

Authors: Josée Dostie; Todd A Richmond; Ramy A Arnaout; Rebecca R Selzer; William L Lee; Tracey A Honan; Eric D Rubio; Anton Krumm; Justin Lamb; Chad Nusbaum; Roland D Green; Job Dekker
Journal: Genome Res Date: 2006-09-05 Impact factor: 9.043

Review 3. Insights from genomic profiling of transcription factors.

Authors: Peggy J Farnham
Journal: Nat Rev Genet Date: 2009-08-11 Impact factor: 53.242

4. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.

Authors: Mark B Gerstein; Zhi John Lu; Eric L Van Nostrand; Chao Cheng; Bradley I Arshinoff; Tao Liu; Kevin Y Yip; Rebecca Robilotto; Andreas Rechtsteiner; Kohta Ikegami; Pedro Alves; Aurelien Chateigner; Marc Perry; Mitzi Morris; Raymond K Auerbach; Xin Feng; Jing Leng; Anne Vielle; Wei Niu; Kahn Rhrissorrakrai; Ashish Agarwal; Roger P Alexander; Galt Barber; Cathleen M Brdlik; Jennifer Brennan; Jeremy Jean Brouillet; Adrian Carr; Ming-Sin Cheung; Hiram Clawson; Sergio Contrino; Luke O Dannenberg; Abby F Dernburg; Arshad Desai; Lindsay Dick; Andréa C Dosé; Jiang Du; Thea Egelhofer; Sevinc Ercan; Ghia Euskirchen; Brent Ewing; Elise A Feingold; Reto Gassmann; Peter J Good; Phil Green; Francois Gullier; Michelle Gutwein; Mark S Guyer; Lukas Habegger; Ting Han; Jorja G Henikoff; Stefan R Henz; Angie Hinrichs; Heather Holster; Tony Hyman; A Leo Iniguez; Judith Janette; Morten Jensen; Masaomi Kato; W James Kent; Ellen Kephart; Vishal Khivansara; Ekta Khurana; John K Kim; Paulina Kolasinska-Zwierz; Eric C Lai; Isabel Latorre; Amber Leahey; Suzanna Lewis; Paul Lloyd; Lucas Lochovsky; Rebecca F Lowdon; Yaniv Lubling; Rachel Lyne; Michael MacCoss; Sebastian D Mackowiak; Marco Mangone; Sheldon McKay; Desirea Mecenas; Gennifer Merrihew; David M Miller; Andrew Muroyama; John I Murray; Siew-Loon Ooi; Hoang Pham; Taryn Phippen; Elicia A Preston; Nikolaus Rajewsky; Gunnar Rätsch; Heidi Rosenbaum; Joel Rozowsky; Kim Rutherford; Peter Ruzanov; Mihail Sarov; Rajkumar Sasidharan; Andrea Sboner; Paul Scheid; Eran Segal; Hyunjin Shin; Chong Shou; Frank J Slack; Cindie Slightam; Richard Smith; William C Spencer; E O Stinson; Scott Taing; Teruaki Takasaki; Dionne Vafeados; Ksenia Voronina; Guilin Wang; Nicole L Washington; Christina M Whittle; Beijing Wu; Koon-Kiu Yan; Georg Zeller; Zheng Zha; Mei Zhong; Xingliang Zhou; Julie Ahringer; Susan Strome; Kristin C Gunsalus; Gos Micklem; X Shirley Liu; Valerie Reinke; Stuart K Kim; LaDeana W Hillier; Steven Henikoff; Fabio Piano; Michael Snyder; Lincoln Stein; Jason D Lieb; Robert H Waterston
Journal: Science Date: 2010-12-22 Impact factor: 47.728

5. A map of the cis-regulatory sequences in the mouse genome.

Authors: Yin Shen; Feng Yue; David F McCleary; Zhen Ye; Lee Edsall; Samantha Kuan; Ulrich Wagner; Jesse Dixon; Leonard Lee; Victor V Lobanenkov; Bing Ren
Journal: Nature Date: 2012-08-02 Impact factor: 49.962

6. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors.

Authors: Kevin Y Yip; Chao Cheng; Nitin Bhardwaj; James B Brown; Jing Leng; Anshul Kundaje; Joel Rozowsky; Ewan Birney; Peter Bickel; Michael Snyder; Mark Gerstein
Journal: Genome Biol Date: 2012-09-26 Impact factor: 13.583

7. A promoter-level mammalian expression atlas.

Authors: Alistair R R Forrest; Hideya Kawaji; Michael Rehli; J Kenneth Baillie; Michiel J L de Hoon; Vanja Haberle; Timo Lassmann; Ivan V Kulakovskiy; Marina Lizio; Masayoshi Itoh; Robin Andersson; Christopher J Mungall; Terrence F Meehan; Sebastian Schmeier; Nicolas Bertin; Mette Jørgensen; Emmanuel Dimont; Erik Arner; Christian Schmidl; Ulf Schaefer; Yulia A Medvedeva; Charles Plessy; Morana Vitezic; Jessica Severin; Colin A Semple; Yuri Ishizu; Robert S Young; Margherita Francescatto; Intikhab Alam; Davide Albanese; Gabriel M Altschuler; Takahiro Arakawa; John A C Archer; Peter Arner; Magda Babina; Sarah Rennie; Piotr J Balwierz; Anthony G Beckhouse; Swati Pradhan-Bhatt; Judith A Blake; Antje Blumenthal; Beatrice Bodega; Alessandro Bonetti; James Briggs; Frank Brombacher; A Maxwell Burroughs; Andrea Califano; Carlo V Cannistraci; Daniel Carbajo; Yun Chen; Marco Chierici; Yari Ciani; Hans C Clevers; Emiliano Dalla; Carrie A Davis; Michael Detmar; Alexander D Diehl; Taeko Dohi; Finn Drabløs; Albert S B Edge; Matthias Edinger; Karl Ekwall; Mitsuhiro Endoh; Hideki Enomoto; Michela Fagiolini; Lynsey Fairbairn; Hai Fang; Mary C Farach-Carson; Geoffrey J Faulkner; Alexander V Favorov; Malcolm E Fisher; Martin C Frith; Rie Fujita; Shiro Fukuda; Cesare Furlanello; Masaaki Furino; Jun-ichi Furusawa; Teunis B Geijtenbeek; Andrew P Gibson; Thomas Gingeras; Daniel Goldowitz; Julian Gough; Sven Guhl; Reto Guler; Stefano Gustincich; Thomas J Ha; Masahide Hamaguchi; Mitsuko Hara; Matthias Harbers; Jayson Harshbarger; Akira Hasegawa; Yuki Hasegawa; Takehiro Hashimoto; Meenhard Herlyn; Kelly J Hitchens; Shannan J Ho Sui; Oliver M Hofmann; Ilka Hoof; Furni Hori; Lukasz Huminiecki; Kei Iida; Tomokatsu Ikawa; Boris R Jankovic; Hui Jia; Anagha Joshi; Giuseppe Jurman; Bogumil Kaczkowski; Chieko Kai; Kaoru Kaida; Ai Kaiho; Kazuhiro Kajiyama; Mutsumi Kanamori-Katayama; Artem S Kasianov; Takeya Kasukawa; Shintaro Katayama; Sachi Kato; Shuji Kawaguchi; Hiroshi Kawamoto; Yuki I Kawamura; Tsugumi Kawashima; Judith S Kempfle; Tony J Kenna; Juha Kere; Levon M Khachigian; Toshio Kitamura; S Peter Klinken; Alan J Knox; Miki Kojima; Soichi Kojima; Naoto Kondo; Haruhiko Koseki; Shigeo Koyasu; Sarah Krampitz; Atsutaka Kubosaki; Andrew T Kwon; Jeroen F J Laros; Weonju Lee; Andreas Lennartsson; Kang Li; Berit Lilje; Leonard Lipovich; Alan Mackay-Sim; Ri-ichiroh Manabe; Jessica C Mar; Benoit Marchand; Anthony Mathelier; Niklas Mejhert; Alison Meynert; Yosuke Mizuno; David A de Lima Morais; Hiromasa Morikawa; Mitsuru Morimoto; Kazuyo Moro; Efthymios Motakis; Hozumi Motohashi; Christine L Mummery; Mitsuyoshi Murata; Sayaka Nagao-Sato; Yutaka Nakachi; Fumio Nakahara; Toshiyuki Nakamura; Yukio Nakamura; Kenichi Nakazato; Erik van Nimwegen; Noriko Ninomiya; Hiromi Nishiyori; Shohei Noma; Shohei Noma; Tadasuke Noazaki; Soichi Ogishima; Naganari Ohkura; Hiroko Ohimiya; Hiroshi Ohno; Mitsuhiro Ohshima; Mariko Okada-Hatakeyama; Yasushi Okazaki; Valerio Orlando; Dmitry A Ovchinnikov; Arnab Pain; Robert Passier; Margaret Patrikakis; Helena Persson; Silvano Piazza; James G D Prendergast; Owen J L Rackham; Jordan A Ramilowski; Mamoon Rashid; Timothy Ravasi; Patrizia Rizzu; Marco Roncador; Sugata Roy; Morten B Rye; Eri Saijyo; Antti Sajantila; Akiko Saka; Shimon Sakaguchi; Mizuho Sakai; Hiroki Sato; Suzana Savvi; Alka Saxena; Claudio Schneider; Erik A Schultes; Gundula G Schulze-Tanzil; Anita Schwegmann; Thierry Sengstag; Guojun Sheng; Hisashi Shimoji; Yishai Shimoni; Jay W Shin; Christophe Simon; Daisuke Sugiyama; Takaai Sugiyama; Masanori Suzuki; Naoko Suzuki; Rolf K Swoboda; Peter A C 't Hoen; Michihira Tagami; Naoko Takahashi; Jun Takai; Hiroshi Tanaka; Hideki Tatsukawa; Zuotian Tatum; Mark Thompson; Hiroo Toyodo; Tetsuro Toyoda; Elvind Valen; Marc van de Wetering; Linda M van den Berg; Roberto Verado; Dipti Vijayan; Ilya E Vorontsov; Wyeth W Wasserman; Shoko Watanabe; Christine A Wells; Louise N Winteringham; Ernst Wolvetang; Emily J Wood; Yoko Yamaguchi; Masayuki Yamamoto; Misako Yoneda; Yohei Yonekura; Shigehiro Yoshida; Susan E Zabierowski; Peter G Zhang; Xiaobei Zhao; Silvia Zucchelli; Kim M Summers; Harukazu Suzuki; Carsten O Daub; Jun Kawai; Peter Heutink; Winston Hide; Tom C Freeman; Boris Lenhard; Vladimir B Bajic; Martin S Taylor; Vsevolod J Makeev; Albin Sandelin; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal: Nature Date: 2014-03-27 Impact factor: 49.962

8. Epigenomics of human embryonic stem cells and induced pluripotent stem cells: insights into pluripotency and implications for disease.

Authors: Alvaro Rada-Iglesias; Joanna Wysocka
Journal: Genome Med Date: 2011-06-07 Impact factor: 11.117

9. Integrative analysis of 111 reference human epigenomes.

Authors: Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal: Nature Date: 2015-02-19 Impact factor: 69.504

10. The accessible chromatin landscape of the human genome.

Authors: Robert E Thurman; Eric Rynes; Richard Humbert; Jeff Vierstra; Matthew T Maurano; Eric Haugen; Nathan C Sheffield; Andrew B Stergachis; Hao Wang; Benjamin Vernot; Kavita Garg; Sam John; Richard Sandstrom; Daniel Bates; Lisa Boatman; Theresa K Canfield; Morgan Diegel; Douglas Dunn; Abigail K Ebersol; Tristan Frum; Erika Giste; Audra K Johnson; Ericka M Johnson; Tanya Kutyavin; Bryan Lajoie; Bum-Kyu Lee; Kristen Lee; Darin London; Dimitra Lotakis; Shane Neph; Fidencio Neri; Eric D Nguyen; Hongzhu Qu; Alex P Reynolds; Vaughn Roach; Alexias Safi; Minerva E Sanchez; Amartya Sanyal; Anthony Shafer; Jeremy M Simon; Lingyun Song; Shinny Vong; Molly Weaver; Yongqi Yan; Zhancheng Zhang; Zhuzhu Zhang; Boris Lenhard; Muneesh Tewari; Michael O Dorschner; R Scott Hansen; Patrick A Navas; George Stamatoyannopoulos; Vishwanath R Iyer; Jason D Lieb; Shamil R Sunyaev; Joshua M Akey; Peter J Sabo; Rajinder Kaul; Terrence S Furey; Job Dekker; Gregory E Crawford; John A Stamatoyannopoulos
Journal: Nature Date: 2012-09-06 Impact factor: 49.962

10 in total

1. T-Gene: improved target gene prediction.

Authors: Timothy O'Connor; Charles E Grant; Mikael Bodén; Timothy L Bailey
Journal: Bioinformatics Date: 2020-06-01 Impact factor: 6.937

2. Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning.

Authors: Hani Jieun Kim; Pierre Osteil; Sean J Humphrey; Senthilkumar Cinghu; Andrew J Oldfield; Ellis Patrick; Emilie E Wilkie; Guangdun Peng; Shengbao Suo; Raja Jothi; Patrick P L Tam; Pengyi Yang
Journal: Nucleic Acids Res Date: 2020-02-28 Impact factor: 16.971

3. Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data.

Authors: Tingting Qin; Christopher Lee; Shiting Li; Raymond G Cavalcante; Peter Orchard; Heming Yao; Hanrui Zhang; Shuze Wang; Snehal Patil; Alan P Boyle; Maureen A Sartor
Journal: Genome Biol Date: 2022-04-26 Impact factor: 17.906

4. Common Regulatory Targets of NFIA, NFIX and NFIB during Postnatal Cerebellar Development.

Authors: James Fraser; Alexandra Essebier; Alexander S Brown; Raul Ayala Davila; Danyon Harkins; Oressia Zalucki; Lauren P Shapiro; Peter Penzes; Brandon J Wainwright; Matthew P Scott; Richard M Gronostajski; Mikael Bodén; Michael Piper; Tracey J Harvey
Journal: Cerebellum Date: 2020-02 Impact factor: 3.847

5. Maternal obesity programs increased leptin gene expression in rat male offspring via epigenetic modifications in a depot-specific manner.

Authors: Simon Lecoutre; Frederik Oger; Charlène Pourpe; Laura Butruille; Lucie Marousez; Anne Dickes-Coopman; Christine Laborie; Céline Guinez; Jean Lesage; Didier Vieau; Claudine Junien; Delphine Eberlé; Anne Gabory; Jérôme Eeckhoute; Christophe Breton
Journal: Mol Metab Date: 2017-05-31 Impact factor: 7.422

6. Endoplasmic reticulum stress actively suppresses hepatic molecular identity in damaged liver.

Authors: Céline Gheeraert; Wouter Vankrunkelsven; Julie Dubois-Chevalier; Hélène Dehondt; Philippe Lefebvre; Jérôme Eeckhoute; Vanessa Dubois; Marie Bobowski-Gerard; Manjula Vinod; Francesco Paolo Zummo; Fabian Güiza; Maheul Ploton; Emilie Dorchies; Laurent Pineau; Alexis Boulinguiez; Emmanuelle Vallez; Eloise Woitrain; Eric Baugé; Fanny Lalloyer; Christian Duhem; Nabil Rabhi; Ronald E van Kesteren; Cheng-Ming Chiang; Steve Lancel; Hélène Duez; Jean-Sébastien Annicotte; Réjane Paumelle; Ilse Vanhorebeek; Greet Van den Berghe; Bart Staels
Journal: Mol Syst Biol Date: 2020-05 Impact factor: 11.429

10. Testing Proximity of Genomic Regions to Transcription Start Sites and Enhancers Complements Gene Set Enrichment Testing.

Authors: Christopher Lee; Kai Wang; Tingting Qin; Maureen A Sartor
Journal: Front Genet Date: 2020-03-06 Impact factor: 4.599