| Literature DB >> 27519564 |
Chia-Chun Yang1,2, Erik H Andrews3, Min-Hsuan Chen2,4, Wan-Yu Wang2, Jeremy J W Chen1,4,5, Mark Gerstein6,7, Chun-Chi Liu8,9,10, Chao Cheng11,12,13.
Abstract
BACKGROUND: Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) or microarray hybridization (ChIP-chip) has been widely used to determine the genomic occupation of transcription factors (TFs). We have previously developed a probabilistic method, called TIP (Target Identification from Profiles), to identify TF target genes using ChIP-seq/ChIP-chip data. To achieve high specificity, TIP applies a conservative method to estimate significance of target genes, with the trade-off being a relatively low sensitivity of target gene identification compared to other methods. Additionally, TIP's output does not render binding-peak locations or intensity, information highly useful for visualization and general experimental biological use, while the variability of ChIP-seq/ChIP-chip file formats has made input into TIP more difficult than desired. DESCRIPTION: To improve upon these facets, here we present are fined TIP with key extensions. First, it implements a Gaussian mixture model for p-value estimation, increasing target gene identification sensitivity and more accurately capturing the shape of TF binding profile distributions. Second, it enables the incorporation of TF binding-peak data by identifying their locations in significant target gene promoter regions and quantifies their strengths. Finally, for full ease of implementation we have incorporated it into a web server ( http://syslab3.nchu.edu.tw/iTAR/ ) that enables flexibility of input file format, can be used across multiple species and genome assembly versions, and is freely available for public use. The web server additionally performs GO enrichment analysis for the identified target genes to reveal the potential function of the corresponding TF.Entities:
Keywords: ChIP-chip; ChIP-seq; Gaussian mixture model; Gene ontology analysis; Transcription factor
Mesh:
Substances:
Year: 2016 PMID: 27519564 PMCID: PMC4983039 DOI: 10.1186/s12864-016-2963-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1An overview of the iTAR web server. a The portal page of the iTAR web server: wig, bigwig, and bedgraph files in rar or gz format can be used for upload. b The waiting page: after a wiggle file is uploaded, a waiting page will be shown to the user, updating job status every 5 s. For this example, we used the ENCODE STAT3 ChIP-seq data in HeLa-S3 cells. c The characteristic binding profile of an example TF. It shows the aggregation plot of TF binding in a 20 kb DNA region centered at the transcriptional start site (TSS). This plot can be used to make a rough evaluation of the data quality. In general, we expect to see a peak near by the TSS. d The density plot of normalized regulatory scores (Z score). For each gene, the iTAR server will calculate a regulatory score, measuring the binding strength of a TF to the gene. The regulatory scores for all genes will then be normalized, and p-values will be calculated based on a single normal distribution model or a mixture normal distribution model. e A list of significant target genes and the associated TF binding site: significant RefSeq genes, gene symbols and P-values will be shown. P-value 1 is estimated based on the mixture normal distribution model, and P-value 2 is estimated based on the single normal distribution model. The summit location is the summit of the binding peak from TSS. f GO enrichment analysis with the target genes
Fig. 2The STAT3 binding profiles in the promoter regions of six histone proteins. The binding profiles are generated by using STAT3 ChIP-seq data in HeLa-S3 cells from the ENCODE project. We selected six histone proteins from STAT3 target genes using the TIP algorithm. The red rectangles indicate peaks from PeakSeq method. a There is no significant peak using PeakSeq method but TIP identifies the genes as target genes. b There are significant peaks using PeakSeq method in the promoter regions and TIP also identifies the genes as the target genes
Fig. 3Comparison analysis of target genes identified by single normal distribution and mixture normal distribution. We utilized the NFE2 ChIP-seq data and gene expression data treated by NFE2 shRNA in K562 cells. a We compared the identified NFE2 target genes using three methods as follows: simple method, TIP with single normal distribution and TIP with mixture normal distribution. In the simple method, we identified target genes using a conventional peak-based method. The simple method identified 1597 target genes. We used various FDR thresholds (x axis) of TIP algorithm to select significant NFE2 target genes. The y axis represents the number of target genes. b We calculated the expression changes (absolute value of log ratios) of all genes for the cells treated with shRNA as compared to the untreated cells, and then we sorted the genes in decreasing order of absolute value of their log ratios. The genes on the left have greater absolute value of log ratios (WT vs. shRNA) and are therefore more responsive to NFE2 regulation. Given a threshold of absolute log ratio (x axis), the y axis shows the number of target genes satisfied the absolute log ratio threshold for each method