Literature DB >> 17940094

Cyclebase.org--a comprehensive multi-organism online database of cell-cycle experiments.

Nicholas Paul Gauthier1, Malene Erup Larsen, Rasmus Wernersson, Ulrik de Lichtenberg, Lars Juhl Jensen, Søren Brunak, Thomas Skøt Jensen.   

Abstract

The past decade has seen the publication of a large number of cell-cycle microarray studies and many more are in the pipeline. However, data from these experiments are not easy to access, combine and evaluate. We have developed a centralized database with an easy-to-use interface, Cyclebase.org, for viewing and downloading these data. The user interface facilitates searches for genes of interest as well as downloads of genome-wide results. Individual genes are displayed with graphs of expression profiles throughout the cell cycle from all available experiments. These expression profiles are normalized to a common timescale to enable inspection of the combined experimental evidence. Furthermore, state-of-the-art computational analyses provide key information on both individual experiments and combined datasets such as whether or not a gene is periodically expressed and, if so, the time of peak expression. Cyclebase is available at http://www.cyclebase.org.

Entities:  

Mesh:

Year:  2007        PMID: 17940094      PMCID: PMC2238932          DOI: 10.1093/nar/gkm729

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The cell division cycle is one of the most fundamental processes of life, allowing cells to multiply and faithfully pass on their genetic information to future generations. The full complexity of this process became apparent a decade ago with the first genome-wide microarray studies of the mitotic cell cycle of budding yeast (1,2). Since then, numerous other microarray studies have been published on the cell cycle of the budding yeast Saccharomyces cerevisiae (3,4), the fission yeast Schizosaccharomyces pombe (5,7), human (8) and the plant Arabidopsis thaliana (9). Accessing, analyzing and comparing these many datasets has unfortunately remained difficult for a variety of reasons. First, there is no single database from which one can download all the datasets in an unified file format. The expression profiles for each experiment are often stored on individual websites. Second, the same gene identifiers are not used across datasets, making it difficult to compare expression profiles from different studies on the same organism. Third, a variety of different methods have, with varying success, been used for identifying the significantly regulated genes (1–28). The use of many different algorithms has introduced uncertainty as to which is the correct set of cell-cycle regulated genes. Fourth, new experimental studies tend to disregard already existing expression data, and thus only evaluate cell-cycle regulation based on their own experiments. Finally, general microarray repositories, analysis methods and visualization tools have by nature not been designed to meet the specific needs of the cell-cycle community. Here, we present Cyclebase.org, a database and web resource of cell-cycle microarray expression datasets (see Table 1 for an overview of the datasets included in Cyclebase). These datasets have been mapped to common gene identifiers and normalized onto a common timescale, facilitating direct comparison of expression profiles between all experiments within an organism. The web interface provides a good visual overview of all available expression data on a given gene, as well as the results from state-of-the-art computational analyses. This interface aids the user in interpreting the combined evidence on the cell-cycle regulation of a given gene.
Table 1.

Summary of cell-cycle microarray experiments in Cyclebase

OrganismGroupMicroarraySamplesCyclesExperiment name
Saccharomyces cerevisiaeCho et al. (1)Affymetrix172Cho-cdc28
Spellman et al. (2)Spotted182Spellman-alpha
242.5Spellman-cdc15
de Lichtenberg et al. (3)Geniom one162de Lichtenberg-cdc15
Pramilla et al. (4)Spotted252Pramilla-alpha30
Spotted252Pramilla-alpha38
Schizosaccharomyces pombeRustici et al. (5)Spotted202Rustici-cdc25-1
182Rustici-cdc25-2
202Rustici-elu1
202Rustici-elu2
202Rustici-elu3
Peng et al. (6)Spotted372Peng-cdc25
322Peng-elu
Oliva et al. (7)Spotted523Oliva-cdc25
502.5Oliva-eluA
333Oliva-eluB
Homo sapiensWhitfield et al. (8)Spotted112Whitfield-thythy1
263Whitfield-thythy2
473Whitfield-thythy3
192Whitfield-thynoc
Arabidopsis thalianaMenges et al. (9)Affymetrix101Menges-aph
60.5Menges-suc

The table summarizes the experiments currently in Cyclebase. Group refers to the original publication on the data. Microarray lists the technology platform used; either single-channel Affymetrix GeneChips (‘Affymetrix’), two-channel spotted cDNA microarrays (‘Spotted’), or in-situ synthesized arrays using the Geniom one platform (‘Geniom one’). Samples denotes the number of samples or time points included in the experiments. Cycles is an estimate of the number of full cell cycles covered by the experiment. Experiment name refers to the label used in Cyclebase for the experiment in question. Please note that the technical replicates by Pramilla et al. (4) are treated as independent experiment, because this leads to better overall performance of the analysis methods.

Summary of cell-cycle microarray experiments in Cyclebase The table summarizes the experiments currently in Cyclebase. Group refers to the original publication on the data. Microarray lists the technology platform used; either single-channel Affymetrix GeneChips (‘Affymetrix’), two-channel spotted cDNA microarrays (‘Spotted’), or in-situ synthesized arrays using the Geniom one platform (‘Geniom one’). Samples denotes the number of samples or time points included in the experiments. Cycles is an estimate of the number of full cell cycles covered by the experiment. Experiment name refers to the label used in Cyclebase for the experiment in question. Please note that the technical replicates by Pramilla et al. (4) are treated as independent experiment, because this leads to better overall performance of the analysis methods.

PRESENTING CYCLEBASE

The interface of Cyclebase is designed to make it as simple as possible for users to find and browse the genes of interest. Searching for key terms such as standard gene names (e.g. HTA2), systematic names (e.g. YBL003C) or descriptions (e.g. histone) will produce a list of candidate genes for inspection. Genes in this list are initially sorted by their match to the search criteria and then in ascending order on the cell-cycle rank score (most periodic genes at the top). The list can be sorted on any of the other columns simply by clicking them. In addition, an advanced search page allows the user to browse for genes that match certain criteria; for example, it allows researchers to find among the 100 most periodic human genes, those that peak in S-phase. When a gene of interest has been selected, or if a query is entered that matches only a single gene, the user is taken to the Gene Details page (Figure 1). This page is the primary interface for viewing expression profiles, key results from statistical analyses and general information about the gene in question. By default, the statistical results are based on all available experiments. Expression profiles and analysis results for the individual experiments can be accessed by clicking on a single experiment in the experiments list (Figure 1A).
Figure 1.

Screenshot for budding yeast CLB1. The figure shows the Gene Details Page for the gene CLB1 (a cyclin). (A) The list of experiments in which the gene is measured. Clicking any of these takes the user to another Gene Details Page with only data from that particular experiment. (B) Expression profile chart. The experiments are normalized and aligned onto a common time-scale (in percent of the cell cycle). The individual phases are marked along the time axis and the computationally determined peaktime is marked by a red dot. (C) Summary of the computational analysis based on all data available for this gene in Cyclebase. ‘Rank’ signifies that this is the 78th most periodic gene in budding yeast, ‘P(per)’ and ‘P(reg)’ are P-values that quantify the significance of periodicity and regulation, respectively, and ‘peaktime’ estimates how far into the cell cycle (from M/G1) the gene is maximally expressed. (D) Schematic illustration of the peaktime (red dot) and phase duration. The gene CLB1 peaks 63 % into the cell cycle, corresponding to the middle of G2 phase in budding yeast. (E) Gene aliases and description. (F) Download of data in various formats.(G) Database documentation and download.

Screenshot for budding yeast CLB1. The figure shows the Gene Details Page for the gene CLB1 (a cyclin). (A) The list of experiments in which the gene is measured. Clicking any of these takes the user to another Gene Details Page with only data from that particular experiment. (B) Expression profile chart. The experiments are normalized and aligned onto a common time-scale (in percent of the cell cycle). The individual phases are marked along the time axis and the computationally determined peaktime is marked by a red dot. (C) Summary of the computational analysis based on all data available for this gene in Cyclebase. ‘Rank’ signifies that this is the 78th most periodic gene in budding yeast, ‘P(per)’ and ‘P(reg)’ are P-values that quantify the significance of periodicity and regulation, respectively, and ‘peaktime’ estimates how far into the cell cycle (from M/G1) the gene is maximally expressed. (D) Schematic illustration of the peaktime (red dot) and phase duration. The gene CLB1 peaks 63 % into the cell cycle, corresponding to the middle of G2 phase in budding yeast. (E) Gene aliases and description. (F) Download of data in various formats.(G) Database documentation and download. To allow for inspection of the accumulated evidence for transcriptional regulation during the cell cycle, all available expression data for a gene of interest are depicted in the expression profile chart (Figure 1B). Easy comparison of different experiments is obtained by placing each profile onto a common time scale, which we have chosen to be in percent of the cell division cycle with zero corresponding to cytokinesis (M/G1-transition) (16,29,30). Such normalization is necessary as the individual experiments vary greatly in their absolute interdivision times, depending on the experimental conditions. Subsequent alignment of the timescales is also necessary, because different experiments release the cells from different points in the cell cycle. Finally, the expression values have been normalized to a standard deviation of one over the entire experiment to further aid comparison across experiments. To provide an unbiased and comparable assessment of the expression data, a common computational analysis framework has been applied to all datasets in the database. For every expression profile, two P-values are calculated that assess the significance of periodicity and regulation (16). The P-values are summarized across all experiments in an organism and combined to a final score, which is used to rank all genes in the genome (16) (Figure 1C). A brief explanation of the algorithms is provided in the Methods section of Cyclebase. Based on independent benchmarking, this methodology has previously been proven to be as good as or superior to all other published methods for identifying periodically expressed genes (16,29,30). We have expanded this benchmark to also include recent methods (1,2,5–28) and experiments (Figure 2). Benchmark sets were compiled that are enriched in cell-cycle regulated genes from targets of known cell-cycle transcription factors (16,29,30). We benchmarked each method's; ability to retrieve genes in these sets. Figure 2 displays the benchmarking results, which shows that the method used in Cyclebase provides clear improvements over other methods and that combining all data for an organism is, not surprisingly, superior to any single dataset analyzed on its own. Based on the benchmarks, we have selected a set of significantly periodically expressed genes within each organism (labeled with a small ‘Periodic’ icon). We found 600 periodic genes in budding yeast, 500 in fission yeast, 600 in human and 400 in the plant A. thaliana. For these periodic genes, we compute the ‘peaktime’ based on all available expression profiles (16).
Figure 2.

Benchmark of methods for identifying cell-cycle regulated genes. For each of the four organisms, a benchmark set was compiled of genes whose promoters are bound by known cell-cycle transcription factors (16,29,30), under the assumption that these genes should be highly overlapping with those that display cell-cycle regulation at the transcriptional level (i.e. periodic expression). The panels show the fraction of a benchmark set retrieved as a function of the number of genes suggested for each individual method (1,2,5–28). Better methods should therefore be towards the upper left corner of the plot. Methods which provide a ranked list of genes are displayed as a line, whereas those that only supply an unranked set of genes appear in the plots as cross mark/plus sign. The black dotted line corresponds to picking genes randomly. In all four organisms, the combined analysis of all data within an organism presented by Cyclebase outperforms all existing methods or suggested sets of periodically expressed genes. In all organisms, the curves eventually display the same slope as the random performance curve (black dotted), indicating that including more genes from this point on yields no enrichment in genes from the benchmark set.

Benchmark of methods for identifying cell-cycle regulated genes. For each of the four organisms, a benchmark set was compiled of genes whose promoters are bound by known cell-cycle transcription factors (16,29,30), under the assumption that these genes should be highly overlapping with those that display cell-cycle regulation at the transcriptional level (i.e. periodic expression). The panels show the fraction of a benchmark set retrieved as a function of the number of genes suggested for each individual method (1,2,5–28). Better methods should therefore be towards the upper left corner of the plot. Methods which provide a ranked list of genes are displayed as a line, whereas those that only supply an unranked set of genes appear in the plots as cross mark/plus sign. The black dotted line corresponds to picking genes randomly. In all four organisms, the combined analysis of all data within an organism presented by Cyclebase outperforms all existing methods or suggested sets of periodically expressed genes. In all organisms, the curves eventually display the same slope as the random performance curve (black dotted), indicating that including more genes from this point on yields no enrichment in genes from the benchmark set. The peaktime is a measure of when in the cell cycle a given gene is maximally expressed, and represents a summary of all the expression data (16). The peaktime is given as percent into the cell cycle (from when the new cell is born in cytokinesis) and is depicted as a red dot in both the expression profile chart (Figure 1B) and the peaktime chart (Figure 1D). The phase length can vary widely from organism to organism (e.g. G2-phase occupies ∼60–70% of the cell cycle in fission yeast versus only ∼25% in budding yeast), and the peaktime chart is therefore drawn differently for each species. Consequently, the peaktime values cannot be directly compared across organisms, since a specific percent (e.g. 60%) into the cell cycle may correspond to different phases in different organisms. The peaktime is only computed for genes that display periodicity and the remaining genes are labeled with ‘uncertain’ for the peaktime value. This label is also used if the different experiments disagree too much for a peaktime to be reliably assigned (16). When comparing expression data across experiments, one issue is that different gene names for the same gene have been used in the different experiments. We have solved this problem by combining expression data and key results based on systematic gene identifiers. When they exist, a list of aliases is provided in the Gene Details page (Figure 1E), allowing the user to relate to the original experiment and to crosslink to external databases. The Gene Details page also contains a functional description (Figure 1E) populated from external databases (31–35) and is therefore not available for all genes. All Cyclebase analysis results are available for download, both as values for individual genes and as whole-experiment datasets. XML and tab-delimited formats are available, both of which are fully documented on the website. Furthermore, where permission has been granted from the original authors, expression profile datasets are also available for download. Every page in Cyclebase also contains links to information about the database (FAQ and Methods), information about the individual experiments, and a link to the datasets available for download (Figure 1G).

OUTLOOK

Many more cell-cycle experiments may be performed in the future, and we encourage researchers to contact us, so that new cell-cycle experiments are analyzed consistently, and can be included in Cyclebase. As other types of large-scale experiments (e.g. metabolite information, kinase activity or protein expression) become available, it will become imperative that researchers integrate and analyze these data together with existing datasets. Cyclebase has been designed to store diverse data types from time-series experiments and we intend for Cyclebase to become a standard interface and tool for combining cell cycle datasets beyond transcriptional regulation. This would give researchers a one-stop shop for visualizing and downloading time-series events from the cell-cycle.
  35 in total

1.  Genome-wide gene expression in an Arabidopsis cell suspension.

Authors:  Margit Menges; Lars Hennig; Wilhelm Gruissem; James A H Murray
Journal:  Plant Mol Biol       Date:  2003-11       Impact factor: 4.076

2.  Identifying genes from up-down properties of microarray expression series.

Authors:  Karen Willbrand; Francois Radvanyi; Jean-Pierre Nadal; Jean-Paul Thiery; Thomas M A Fink
Journal:  Bioinformatics       Date:  2005-10-15       Impact factor: 6.937

3.  Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms.

Authors:  Earl F Glynn; Jie Chen; Arcady R Mushegian
Journal:  Bioinformatics       Date:  2005-11-22       Impact factor: 6.937

4.  Comparison of computational methods for the identification of cell cycle-regulated genes.

Authors:  Ulrik de Lichtenberg; Lars Juhl Jensen; Anders Fausbøll; Thomas S Jensen; Peer Bork; Søren Brunak
Journal:  Bioinformatics       Date:  2004-10-28       Impact factor: 6.937

5.  Co-evolution of transcriptional and post-translational cell-cycle regulation.

Authors:  Lars Juhl Jensen; Thomas Skøt Jensen; Ulrik de Lichtenberg; Søren Brunak; Peer Bork
Journal:  Nature       Date:  2006-09-27       Impact factor: 49.962

6.  Identifying cycling genes by combining sequence homology and expression data.

Authors:  Yong Lu; Roni Rosenfeld; Ziv Bar-Joseph
Journal:  Bioinformatics       Date:  2006-07-15       Impact factor: 6.937

7.  New weakly expressed cell cycle-regulated genes in yeast.

Authors:  Ulrik de Lichtenberg; Rasmus Wernersson; Thomas Skøt Jensen; Henrik Bjørn Nielsen; Anders Fausbøll; Peer Schmidt; Flemming Bryde Hansen; Steen Knudsen; Søren Brunak
Journal:  Yeast       Date:  2005-11       Impact factor: 3.239

8.  Identification of significant periodic genes in microarray gene expression data.

Authors:  Jie Chen
Journal:  BMC Bioinformatics       Date:  2005-11-30       Impact factor: 3.169

9.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.

Authors:  P T Spellman; G Sherlock; M Q Zhang; V R Iyer; K Anders; M B Eisen; P O Brown; D Botstein; B Futcher
Journal:  Mol Biol Cell       Date:  1998-12       Impact factor: 4.138

10.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.

Authors:  Seung Yon Rhee; William Beavis; Tanya Z Berardini; Guanghong Chen; David Dixon; Aisling Doyle; Margarita Garcia-Hernandez; Eva Huala; Gabriel Lander; Mary Montoya; Neil Miller; Lukas A Mueller; Suparna Mundodi; Leonore Reiser; Julie Tacklind; Dan C Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

View more
  40 in total

Review 1.  Topology and control of the cell-cycle-regulated transcriptional circuitry.

Authors:  Steven B Haase; Curt Wittenberg
Journal:  Genetics       Date:  2014-01       Impact factor: 4.562

2.  A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide.

Authors:  Jonathan D Wren
Journal:  Bioinformatics       Date:  2009-05-15       Impact factor: 6.937

3.  High-resolution transcription atlas of the mitotic cell cycle in budding yeast.

Authors:  Marina V Granovskaia; Lars J Jensen; Matthew E Ritchie; Joern Toedling; Ye Ning; Peer Bork; Wolfgang Huber; Lars M Steinmetz
Journal:  Genome Biol       Date:  2010-03-01       Impact factor: 13.583

4.  The proteins of intra-nuclear bodies: a data-driven analysis of sequence, interaction and expression.

Authors:  Nurul Mohamad; Mikael Bodén
Journal:  BMC Syst Biol       Date:  2010-04-13

5.  Phase Coupled Meta-analysis: sensitive detection of oscillations in cell cycle gene expression, as applied to fission yeast.

Authors:  Saumyadipta Pyne; Roee Gutman; Chang Sik Kim; Bruce Futcher
Journal:  BMC Genomics       Date:  2009-09-17       Impact factor: 3.969

6.  Cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis results.

Authors:  Nicholas Paul Gauthier; Lars Juhl Jensen; Rasmus Wernersson; Søren Brunak; Thomas S Jensen
Journal:  Nucleic Acids Res       Date:  2009-11-24       Impact factor: 16.971

7.  Time warping of evolutionary distant temporal gene expression data based on noise suppression.

Authors:  Yury Goltsev; Dmitri Papatsenko
Journal:  BMC Bioinformatics       Date:  2009-10-26       Impact factor: 3.169

8.  Hierarchical coordination of periodic genes in the cell cycle of Saccharomyces cerevisiae.

Authors:  Frank Emmert-Streib; Matthias Dehmer
Journal:  BMC Syst Biol       Date:  2009-07-20

9.  Predicting cell cycle regulated genes by causal interactions.

Authors:  Frank Emmert-Streib; Matthias Dehmer
Journal:  PLoS One       Date:  2009-08-18       Impact factor: 3.240

10.  Robust discovery of periodically expressed genes using the laplace periodogram.

Authors:  Kuo-ching Liang; Xiaodong Wang; Ta-Hsin Li
Journal:  BMC Bioinformatics       Date:  2009-01-11       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.