| Literature DB >> 25925574 |
Hana Imrichová1, Gert Hulselmans1, Zeynep Kalender Atak1, Delphine Potier1, Stein Aerts2.
Abstract
i-cisTarget is a web tool to predict regulators of a set of genomic regions, such as ChIP-seq peaks or co-regulated/similar enhancers. i-cisTarget can also be used to identify upstream regulators and their target enhancers starting from a set of co-expressed genes. Whereas the original version of i-cisTarget was focused on Drosophila data, the 2015 update also provides support for human and mouse data. i-cisTarget detects transcription factor motifs (position weight matrices) and experimental data tracks (e.g. from ENCODE, Roadmap Epigenomics) that are enriched in the input set of regions. As experimental data tracks we include transcription factor ChIP-seq data, histone modification ChIP-seq data and open chromatin data. The underlying processing method is based on a ranking-and-recovery procedure, allowing accurate determination of enrichment across heterogeneous datasets, while also discriminating direct from indirect target regions through a 'leading edge' analysis. We illustrate i-cisTarget on various Ewing sarcoma datasets to identify EWS-FLI1 targets starting from ChIP-seq, differential ATAC-seq, differential H3K27ac and differential gene expression data. Use of i-cisTarget is free and open to all, and there is no login requirement. Address: http://gbiomed.kuleuven.be/apps/lcb/i-cisTarget.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25925574 PMCID: PMC4489282 DOI: 10.1093/nar/gkv395
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Publicly available regulatory datasets used to create i-cisTarget human CRRs (A) and publicly available regulatory datasets used to create i-cisTarget mouse CRRs (B)
| GBP | CpG | Proximal promoters | CNS | UCR | Oreganno | Vista enhancers | CRMs | DHS | |
|---|---|---|---|---|---|---|---|---|---|
| Number of regions | 61550 | 27718 | 34722 | 232101 | 15931 | 23112 | 1339 | 123500 | 1281988 |
| % of the genome | 1.77 | 0.73 | 0.67 | 2.25 | 0.13 | 0.39 | 0.07 | 2.05 | 13.36 |
| CpG | Proximal promoters | CNS | UCR | Oreganno | Vista enhancers | CRMs | UCNE | DHS | |
| Number of regions | 16026 | 22984 | 231478 | 15927 | 16976 | 339 | 91176 | 4335 | 14971709 |
| % of the genome | 0.40 | 0.41 | 2.49 | 0.14 | 0.38 | 0.02 | 1.66 | 0.05 | 29.16 |
Figure 1.i-cisTarget workflow. The i-cisTarget web-tool consists of two parts, namely the ranking (offline part) and recovery (online part). (a) A set of 1,223,024 candidate regulatory regions (CRRs) is defined based on publicly available regulatory data, representing 35% of the human genome. (b) The collection of CRRs is scored and ranked according to different features, including motifs, TF and histone ChIP-seq, DNase-seq and FAIRE-seq, resulting in large ranking databases. (c) The online part starts with user input, which can be a set of genomic regions or a set of genes for human, mouse or fly. (d) The input set is mapped to the candidate i-cisTarget regions. In the case of regions/peaks, the overlapping CRRs (minimum overlap percentage is a parameter) with the peaks are considered in the analysis. When a gene set is used as input, then CRRs overlapping the entire space of X kb around TSSs are taken into the analysis (the default space for human and mouse genome is 20 kb around TSS; the default for fly is 5 kb upstream of TSS and all introns). (e) The recovery analysis identifies the feature for which the input CRRs are most enriched in the top of the CRR ranking of that feature. This enrichment is calculated by the Area Under the recover Curve (AUC) and all features with normalized AUC (i.e. Normalized Enrichment Score, or NES) above 3.0 are returned. (f) For each enriched feature and upstream regulator the direct target regions are provided, with a link to a BED file for download and a track in the UCSC Genome Browser.
Human regulatory tracks included in the databases (A) and mouse regulatory tracks included in the databases (B)
| ENCODE | Epigenome roadmap | Taipale | Aerts | ∑ | |
|---|---|---|---|---|---|
| DHS | 467 | 390 | 0 | 0 | 857 |
| FAIRE | 37 | 0 | 0 | 14 | 51 |
| Histone | 402 | 1572 | 3 | 26 | 2003 |
| TF ChIP-seq | 1274 | 0 | 117 | 3 | 1394 |
| ∑ | 2180 | 1962 | 120 | 43 | 4305 |
| ENCODE | |||||
| DHS | 150 | ||||
| FAIRE | 0 | ||||
| Histone | 209 | ||||
| TF ChIP-seq | 206 | ||||
| ∑ | 565 | ||||
Figure 2.Ewing sarcoma case study. Various i-cisTarget analyses with different types of input, all related to EWS-FLI1 targets in Ewing sarcoma. (a) All input datasets are derived from Riggi et al. (35) and include FLI1 ChIP-seq peaks, top differentially less active peaks based on H3K27ac ChIP-seq upon EWS-FLI1 knockdown, differentially more open regions based on ATAC-seq after EWS-FLI1 activation and differentially downregulated genes after EWS-FLI1 knockdown (see ‘Materials and Methods’ section). (b) Input regions are automatically mapped to CRRs. (c). Each of the sets was analysed independently and reassuringly the expected motif EWS-FLI1 was ranked at the top, alongside regulatory tracks, mainly obtained on the SK-N-MC Ewing sarcoma cell line. Note that the rank of the motifs is represented by two values––the first one for the rank of the cluster of similar motifs, the second one is between brackets and represents the rank of the specific motif. (d) Distributions of AUC scores for a given input across all features in the selected databases (marked in purple in tables (c)) with an arrow indicating the top feature within that database. (e) The recovery curves for the top ranked features within the database, where the leading edge (LE) indicates the number of highly ranked target regions. (f) UCSC genome browser screenshot representing an example of one direct target region (red arrowhead) in the intron of gene APOH, which is included in the set of downregulated genes. This region was predicted as a target of EWS-FLI1 in i-cisTarget analyses, of the top less active H3K37ac peaks and the top 200 downregulated genes as well as FLI1 ChIP-seq peaks. The specific binding site is represented by a cluster of EWS-FLI1 motifs (green arrowhead), which was generated by i-cisTarget subsequent analysis, when the predicted target regions of EWS-FLI1 were scanned for CRMs of this factor. All these tracks are represented on the screenshot (from top to bottom): the CRRs, the predicted cluster of EWS-FLI1 motifs, RNA-seq peaks in SK-N-MC and A673 after shFLI1 (two purple tracks, published in (35)) and control (two orange tracks, published in (35)), H3K27ac peaks in SK-N-MC and MSC cell lines expressing EWS-FLI1 (green tracks, published in (35)), FLI1 ChIP-seq track in SK-N-MC and A673 cell lines (blue tracks, published in (35)) as well as DHS on SK-N-MC which was found as the top track within non-TF regulatory tracks (black track, from ENCODE database (15)).