Literature DB >> 16845068

CEAS: cis-regulatory element annotation system.

Xuwo Ji1, Wei Li, Jun Song, Liping Wei, X Shirley Liu.   

Abstract

The recent availability of high-density human genome tiling arrays enables biologists to conduct ChIP-chip experiments to locate the in vivo-binding sites of transcription factors in the human genome and explore the regulatory mechanisms. Once genomic regions enriched by transcription factor ChIP-chip are located, genome-scale downstream analyses are crucial but difficult for biologists without strong bioinformatics support. We designed and implemented the first web server to streamline the ChIP-chip downstream analyses. Given genome-scale ChIP regions, the cis-regulatory element annotation system (CEAS) retrieves repeat-masked genomic sequences, calculates GC content, plots evolutionary conservation, maps nearby genes and identifies enriched transcription factor-binding motifs. Biologists can utilize CEAS to retrieve useful information for ChIP-chip validation, assemble important knowledge to include in their publication and generate novel hypotheses (e.g. transcription factor cooperative partner) for further study. CEAS helps the adoption of ChIP-chip in mammalian systems and provides insights towards a more comprehensive understanding of transcriptional regulatory mechanisms. The URL of the server is http://ceas.cbi.pku.edu.cn.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16845068      PMCID: PMC1538818          DOI: 10.1093/nar/gkl322

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Chromatin immunoprecipitation coupled with DNA microarrays (ChIP–chip) has become a popular technique to identify genome-wide in vivo protein–DNA interactions. With the recent availability of commercial human genome tiling microarrays, many laboratories are starting to combine these two technologies to detect cis-regulatory elements in the human genome. Despite the importance of ChIP–chip, there is still a shortage of convenient tools developed to streamline the downstream analyses with the capability of processing genome-scale ChIP regions. So far all the ChIP–chip papers in mammalian systems are published as a direct result of powerful bioinformatics support (1–6), which may not be available for smaller labs. Therefore, web servers that can perform comprehensive analyses of hundreds or thousands of ChIP regions are not only valuable to biologists, but also useful for promoting the adoption of this powerful technology. We present a comprehensive cis-regulatory element annotation system (CEAS) web server that integrates useful tools for sequence analysis and annotation of ChIP regions in the human genome. CEAS results not only help biologists analyze and validate their ChIP regions, but also can be directly included in their manuscript or Supplementary Data.

WEB APPLICATION

CEAS is composed of three parts: (i) a front-end web-based user interface for input data submission, input data validation and job scheduling; (ii) an annotation engine for sequence analysis and annotations; and (iii) a reporting system for output generation and Email notification to the user.

User input

CEAS accepts an input file with ChIP regions in either UCSC BEDformat () or Sanger GFF format (). The standard BED files have three required fields: chrom for the chromosome name, chromStart for the starting position of a ChIP region on the chromosome and chromEnd for the ending position of the ChIP region on the chromosome. The chromosome coordinates of the ChIP regions should follow the human genome assembly version Build 35 (Hg17). Coordinates based on earlier genome assembly can be converted to Hg17 using the Batch Coordinate Conversion at UCSC genome browser (7). A unique identifier for every ChIP region, ordinarily an optional fouth field in BED files, is also required by CEAS. Because sequence analysis and annotation for genome-scale ChIP regions are time consuming, CEAS requires the user to supply an Email address. After submission, the server will put each job submission on queue and Email the user once the computation is finished. Alternatively, if the user inputs ‘guest’ instead of an Email address, the server will return a confirmation page which will be redirected to a result page when the annotation is finished. The output files will be stored on the server for 3 days to ensure that the user has enough time to browse and download the results.

Sequence retrieval

Although several websites can retrieve repeat-masked sequence for a particular genomic region, none can handle hundreds to thousands of ChIP regions simultaneously. Furthermore, current retrieval websites mask only RepeatMasker repeats () and tandem repeats with period of 12 or less (8). Tandem repeats with period of >12 could greatly affect the qPCR primer design for ChIP region validation and sequence motif finding within the ChIP regions. CEAS automatically retrieves the genomic sequences of all the ChIP regions with all RepeatMasker repeats and all tandem repeats masked, and presents them in FASTA format for user download.

Conservation plot

Comparative genomics has been widely used to identify cis-regulatory elements in higher eukaryotes (9), and thus biologists are often interested in knowing the level of conservation of the ChIP regions. CEAS uses the high-quality phastCons (10) information from the UCSC GoldenPath genome resource, which assigns a conservation score based on a phylogenetic hidden Markov model to virtually every nucleotide in the human genome. CEAS generates a thumbnail phastCons conservation plot for each ChIP region, allowing biologists to skim through hundreds of ChIP regions in a single pdf file. In addition, the server extends both ends of each ChIP region to 3 kb, calculates an average phastCons score for each position and generates an average conservation plot. This final conservation plot can give biologists an idea of how conserved their ChIP regions are (in the middle of the plot) compared to the genomic background (at both ends of the plot).

Nearby gene mapping

For each ChIP region, CEAS reports the nearest RefSeq genes in both upstream and downstream directions on both strands unless no gene is found within 300 kb. When a ChIP region lies within a gene, CEAS reports whether it is in the 5′-untranslated region (5′-UTR), 3′-UTR, a coding exon or an intron. For each ChIP region, CEAS provides its length, GC content and a link to UCSC genome browser. The server also gives a summary statistic for GC content and gene mapping of all the ChIP regions, including the percentages of ChIP regions that reside in proximal promoters (1 kb upstream from RefSeq 5′ start), immediate downstream (1 kb from RefSeq 3′ end), 5′-UTRs, 3′-UTRs, coding exons, introns and enhancers (>1 kb from RefSeq). This rough estimate of the ChIP region distribution helps biologists understand the specific binding behavior of their transcription factor.

Motif finding and enrichment analysis

CEAS finds enriched sequence motifs in the ChIP regions that are putatively bound by the ChIP–chip transcription factor and its cooperative-binding partners. The current best de novo motif finding methods for ChIP–chip includes MEME (11), AlignACE (12), Mascan (13) and their combinations (14). For known motif-scanning methods, the best is TRANSFAC (15) or JASPAR (16) motif scan. Since the latter is less time consuming and can be pre-computed, we decided to use it. CEAS pre-collected all the motif matrices in the TRANSFAC (15) and JASPAR (16) databases, and filtered out motifs from microbial genomes or constructed with <10 sites to get ∼800 well-characterized eukaryotic motifs. For each motif, CEAS pre-computed and stored all its hits (with information on chromosome, position, strand and score) in the fully repeat-masked human genomic sequence. The score of a particular w-mer hit to a motif of width w is calculated as follows: where the background is the 9th order nucleotide Markov dependency estimated from the human genomic sequence. A score cutoff of Max (5,0.9 × Motif Relative Entropy) is used to call a motif a hit. The relative entropy of a motif of width w is calculated as where m is the probability of seeing nucleotide i at position j, and p is the probability of i in the human genome. Given user's ChIP regions, CEAS counts the number of hits for every motif both within the ChIP region and in the whole genome. To be comprehensive, CEAS chooses a relative less stringent criteria of >1.5-fold change and binomial test P-value <1E−5 to report motifs enriched in the ChIP regions. Reported motifs are ranked by their P-values so biologists could refine the motif list with a more stringent cutoff. With each reported motif, CEAS provides its fold change, P-value, hit sequence in the ChIP regions and sequence logo (17).

Example output

Without other jobs pending on the queue, it takes CEAS ∼20 min to process an input with 1000 ChIP regions each of length ∼600 bp. Once the computation is finished, CEAS notifies the user by Email with a link to the result page. The result Html page reports each of the CEAS analysis results in different sections for user to view and download (Figure 1).
Figure 1

CEAS sample output. The top window contains links to each of the analysis results. Excerpts from the result sections are shown in the blue callouts in counter-clockwise order as genomic sequence of the ChIP regions in FASTA format, average conservation plot of the ChIP regions, sequence logo of an enriched motif, motif site list with fold change and P-values and summary of nearby gene mapping of all the ChIP regions.

DISCUSSION

CEAS is the first web server that allows high-throughput and comprehensive downstream analyses of human ChIP–chip data. The sequence retrieval function helps biologists design qPCR primers for validation and perform motif finding. The conservation plot function explores the functional conservation of the ChIP–chip transcription factor which could potentially be used to refine motif search. The nearby gene mapping function predicts the genes regulated by the transcript factor-bound regions. The motif finding function predicts the putative binding motif of the ChIP–chip transcription factor, which further validates the ChIP regions. It also predicts the cooperative-binding partners of the transcription factor. Many of the CEAS results can be directly incorporated in the user's ChIP–chip manuscript or Supplementary Data. ChIP–chip on genome tiling array is still in its infancy. We are very lucky to work with the pioneers in this field, and foresee the necessary analysis tools that other ChIP–chip laboratories would need. As tiling arrays of other eukaryotic genomes become available and more biologists adopt the ChIP–chip technology, we envision CEAS to include more organisms with more and friendlier functionalities such as qPCR primer design for each ChIP region, motif scan for user provided motifs or de novo motif discovery.
  17 in total

1.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae.

Authors:  J D Hughes; P W Estep; S Tavazoie; G M Church
Journal:  J Mol Biol       Date:  2000-03-10       Impact factor: 5.469

2.  The UCSC Table Browser data retrieval tool.

Authors:  Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Authors:  Albin Sandelin; Wynand Alkema; Pär Engström; Wyeth W Wasserman; Boris Lenhard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

4.  A suite of web-based programs to search for transcriptional regulatory motifs.

Authors:  Yueyi Liu; Liping Wei; Serafim Batzoglou; Douglas L Brutlag; Jun S Liu; X Shirley Liu
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

5.  WebLogo: a sequence logo generator.

Authors:  Gavin E Crooks; Gary Hon; John-Marc Chandonia; Steven E Brenner
Journal:  Genome Res       Date:  2004-06       Impact factor: 9.043

6.  Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs.

Authors:  Simon Cawley; Stefan Bekiranov; Huck H Ng; Philipp Kapranov; Edward A Sekinger; Dione Kampa; Antonio Piccolboni; Victor Sementchenko; Jill Cheng; Alan J Williams; Raymond Wheeler; Brant Wong; Jorg Drenkow; Mark Yamanaka; Sandeep Patel; Shane Brubaker; Hari Tammana; Gregg Helt; Kevin Struhl; Thomas R Gingeras
Journal:  Cell       Date:  2004-02-20       Impact factor: 41.582

7.  Core transcriptional regulatory circuitry in human embryonic stem cells.

Authors:  Laurie A Boyer; Tong Ihn Lee; Megan F Cole; Sarah E Johnstone; Stuart S Levine; Jacob P Zucker; Matthew G Guenther; Roshan M Kumar; Heather L Murray; Richard G Jenner; David K Gifford; Douglas A Melton; Rudolf Jaenisch; Richard A Young
Journal:  Cell       Date:  2005-09-23       Impact factor: 41.582

8.  CREB binds to multiple loci on human chromosome 22.

Authors:  Ghia Euskirchen; Thomas E Royce; Paul Bertone; Rebecca Martone; John L Rinn; F Kenneth Nelson; Fred Sayward; Nicholas M Luscombe; Perry Miller; Mark Gerstein; Sherman Weissman; Michael Snyder
Journal:  Mol Cell Biol       Date:  2004-05       Impact factor: 4.272

9.  Transcriptional regulatory code of a eukaryotic genome.

Authors:  Christopher T Harbison; D Benjamin Gordon; Tong Ihn Lee; Nicola J Rinaldi; Kenzie D Macisaac; Timothy W Danford; Nancy M Hannett; Jean-Bosco Tagne; David B Reynolds; Jane Yoo; Ezra G Jennings; Julia Zeitlinger; Dmitry K Pokholok; Manolis Kellis; P Alex Rolfe; Ken T Takusagawa; Eric S Lander; David K Gifford; Ernest Fraenkel; Richard A Young
Journal:  Nature       Date:  2004-09-02       Impact factor: 49.962

10.  TRANSFAC: transcriptional regulation, from patterns to profiles.

Authors:  V Matys; E Fricke; R Geffers; E Gössling; M Haubrock; R Hehl; K Hornischer; D Karas; A E Kel; O V Kel-Margoulis; D-U Kloos; S Land; B Lewicki-Potapov; H Michael; R Münch; I Reuter; S Rotert; H Saxel; M Scheer; S Thiele; E Wingender
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

View more
  116 in total

1.  Corepressor protein CDYL functions as a molecular bridge between polycomb repressor complex 2 and repressive chromatin mark trimethylated histone lysine 27.

Authors:  Yu Zhang; Xiaohan Yang; Bin Gui; Guojia Xie; Di Zhang; Yongfeng Shang; Jing Liang
Journal:  J Biol Chem       Date:  2011-10-17       Impact factor: 5.157

Review 2.  Chemical and biochemical approaches in the study of histone methylation and demethylation.

Authors:  Keqin Kathy Li; Cheng Luo; Dongxia Wang; Hualiang Jiang; Y George Zheng
Journal:  Med Res Rev       Date:  2012-07       Impact factor: 12.944

3.  Transcriptional silencing of {gamma}-globin by BCL11A involves long-range interactions and cooperation with SOX6.

Authors:  Jian Xu; Vijay G Sankaran; Min Ni; Tobias F Menne; Rishi V Puram; Woojin Kim; Stuart H Orkin
Journal:  Genes Dev       Date:  2010-04-15       Impact factor: 11.361

4.  Histone demethylase Kdm4b functions as a co-factor of C/EBPβ to promote mitotic clonal expansion during differentiation of 3T3-L1 preadipocytes.

Authors:  L Guo; X Li; J-X Huang; H-Y Huang; Y-Y Zhang; S-W Qian; H Zhu; Y-D Zhang; Y Liu; Y Liu; K-K Wang; Q-Q Tang
Journal:  Cell Death Differ       Date:  2012-06-22       Impact factor: 15.828

5.  N6-methyladenine DNA Modification in Glioblastoma.

Authors:  Qi Xie; Tao P Wu; Ryan C Gimple; Zheng Li; Briana C Prager; Qiulian Wu; Yang Yu; Pengcheng Wang; Yinsheng Wang; David U Gorkin; Cheng Zhang; Alexis V Dowiak; Kaixuan Lin; Chun Zeng; Yinghui Sui; Leo J Y Kim; Tyler E Miller; Li Jiang; Christine H Lee; Zhi Huang; Xiaoguang Fang; Kui Zhai; Stephen C Mack; Maike Sander; Shideng Bao; Amber E Kerstetter-Fogle; Andrew E Sloan; Andrew Z Xiao; Jeremy N Rich
Journal:  Cell       Date:  2018-11-01       Impact factor: 41.582

6.  FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription.

Authors:  Mathieu Lupien; Jérôme Eeckhoute; Clifford A Meyer; Qianben Wang; Yong Zhang; Wei Li; Jason S Carroll; X Shirley Liu; Myles Brown
Journal:  Cell       Date:  2008-03-21       Impact factor: 41.582

7.  Phosphorylation regulates FOXC2-mediated transcription in lymphatic endothelial cells.

Authors:  Konstantin I Ivanov; Yan Agalarov; Leena Valmu; Olga Samuilova; Johanna Liebl; Nawal Houhou; Hélène Maby-El Hajjami; Camilla Norrmén; Muriel Jaquet; Naoyuki Miura; Nadine Zangger; Seppo Ylä-Herttuala; Mauro Delorenzi; Tatiana V Petrova
Journal:  Mol Cell Biol       Date:  2013-07-22       Impact factor: 4.272

8.  Chromatin immunoprecipitation on microarray analysis of Smad2/3 binding sites reveals roles of ETS1 and TFAP2A in transforming growth factor beta signaling.

Authors:  Daizo Koinuma; Shuichi Tsutsumi; Naoko Kamimura; Hirokazu Taniguchi; Keiji Miyazawa; Makoto Sunamura; Takeshi Imamura; Kohei Miyazono; Hiroyuki Aburatani
Journal:  Mol Cell Biol       Date:  2008-10-27       Impact factor: 4.272

Review 9.  Analysis of epigenetic alterations to chromatin during development.

Authors:  Meghan E Minard; Abhinav K Jain; Michelle Craig Barton
Journal:  Genesis       Date:  2009-08       Impact factor: 2.487

10.  PPARgamma and C/EBP factors orchestrate adipocyte biology via adjacent binding on a genome-wide scale.

Authors:  Martina I Lefterova; Yong Zhang; David J Steger; Michael Schupp; Jonathan Schug; Ana Cristancho; Dan Feng; David Zhuo; Christian J Stoeckert; X Shirley Liu; Mitchell A Lazar
Journal:  Genes Dev       Date:  2008-11-01       Impact factor: 11.361

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.