| Literature DB >> 31053864 |
Shaojuan Li1, Changxin Wan1, Rongbin Zheng1, Jingyu Fan1, Xin Dong1, Clifford A Meyer2,3, X Shirley Liu1,2,3.
Abstract
Characterizing the ontologies of genes directly regulated by a transcription factor (TF), can help to elucidate the TF's biological role. Previously, we developed a widely used method, BETA, to integrate TF ChIP-seq peaks with differential gene expression (DGE) data to infer direct target genes. Here, we provide Cistrome-GO, a website implementation of this method with enhanced features to conduct ontology analyses of gene regulation by TFs in human and mouse. Cistrome-GO has two working modes: solo mode for ChIP-seq peak analysis; and ensemble mode, which integrates ChIP-seq peaks with DGE data. Cistrome-GO is freely available at http://go.cistrome.org/.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31053864 PMCID: PMC6602521 DOI: 10.1093/nar/gkz332
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of web servers for GO enrichment analysis of TF ChIP-seq data
| Web server | ChIP-seq peaks and genes association method | Measurement of genes as targets | Integrating expression | GO enrichment calculation | Ref |
|---|---|---|---|---|---|
| Cistrome-GO | RP score depending on the distance to TSS | continuous values | Yes | Minimum hypergeometric test | |
| GREAT | Within a gene’s regulatory domain | 0/1 assignments to genomic regions | No | Binomial test over genomic regions | ( |
| ChIP-Enrich | Option from: nearest TSS, nearest gene, ≤1 kb from TSS, ≤5 kb from TSS, etc. | 0/1 assignments | No | Wald test for logistic regression | ( |
| Enrichr | Nearest TSS | 0/1 assignments | No | Fisher’s exact test | ( |
Figure 1.The workflow of Cistrome-GO. (A) Schema of RP score calculation for gene g. Two peaks are located near to gene g, with peak to TSS genomic distances of d and d. All k ChIP-seq peaks near the TSS of gene g are used in the RP score calculation (k = 2 in this figure). The pink area represents the decay function used in the RP score calculation. The parameter d is the decay distance of the peak weighting function. (B) The ensemble mode workflow. If the user uploads both TF ChIP-seq peak and DGE analysis files, Cistrome-GO will perform an ensemble mode analysis based on the integration of the two types of data with the following three steps. Step 1: calculation of the adjusted RP score. Step 2: integration with differential expression data by rank product. Step 3: GO and pathway analysis based on gene ranking. Given a GO or KEGG term j, the gene ranking (with high ranking genes represented in bright pink) is translated into a series of 1s or 0s, which indicate the presence or absence of the ranked genes in the jth term. The mHG test is applied to this series to assess whether the 1s tend occur near the top of the ranked gene list.
Figure 2.The performance of Cistrome-GO in solo mode. (A) Evaluation of the performance between Cistrome-GO and other web servers (including GREAT, ChIP-Enrich and Enrichr) on 256 TF ChIP-seq datasets. For each web server, the most significant 5000, 10 000, 15 000, 20 000 and all ChIP-seq peaks were used in the evaluation separately. The MSS score quantifies the similarity between the web server predictions and the standard set of GO terms for each TF. (B) The percentage of the most significant E2F1 ChIP-seq peaks (top 2000, 5000, 10 000 and all peaks separately; ranked by −log10(P-value)) are close (<1 kb) to TSS. (C) The top 10 enriched BP terms for 10 000 E2F1 ChIP-seq peaks using 1 kb as the decay distance. The BP terms indicated by red arrows are those relevant to cell-cycle functions. The color gradient represents the gene number in each GO term.
Figure 3.The performance of Cistrome-GO in ensemble mode, with STAT3 ChIP-seq and STAT3 knock-down DGE data as input. (A) The top 10 enriched BP terms in the ensemble mode. The BP terms indicated by red arrows are those relevant to STAT3’s reported functions. (B) The top 10 enriched BP terms for STAT3 in solo mode. The BP terms indicated by red arrows are those relevant to STAT3’s reported functions. The color gradient represents the gene number in each GO term.