Literature DB >> 29912458

CircadiOmics: circadian omic web portal.

Nicholas Ceglia^1,2, Yu Liu^1,2, Siwei Chen^1,2, Forest Agostinelli^1,2, Kristin Eckel-Mahan³, Paolo Sassone-Corsi^2,4,5, Pierre Baldi^1,2,4,5.

Abstract

Circadian rhythms play a fundamental role at all levels of biological organization. Understanding the mechanisms and implications of circadian oscillations continues to be the focus of intense research. However, there has been no comprehensive and integrated way for accessing and mining all circadian omic datasets. The latest release of CircadiOmics (http://circadiomics.ics.uci.edu) fills this gap for providing the most comprehensive web server for studying circadian data. The newly updated version contains high-throughput 227 omic datasets corresponding to over 74 million measurements sampled over 24 h cycles. Users can visualize and compare oscillatory trajectories across species, tissues and conditions. Periodicity statistics (e.g. period, amplitude, phase, P-value, q-value etc.) obtained from BIO_CYCLE and other methods are provided for all samples in the repository and can easily be downloaded in the form of publication-ready figures and tables. New features and substantial improvements in performance and data volume make CircadiOmics a powerful web portal for integrated analysis of circadian omic data.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 29912458 PMCID： PMC6030824 DOI： 10.1093/nar/gky441

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Circadian rhythms are a ubiquitous phenomenon in biology that is deeply rooted in evolution (1,2). Circadian oscillations of molecular species maintain homeostatic balance by regulating a variety of physiological and metabolic processes. These processes include sleep/wake cycle, hormone secretion, diet related metabolism and neural function (3–6). Disruption in circadian rhythms can lead to a wide range of health problems such as diabetes, obesity and premature aging (7–11). It is well known that circadian oscillations at the transcriptomic level are pervasive and well coordinated (4,12,2). Oscillation in transcription is strongly regulated by a number of key transcription factors, such as CLOCK, BMAL1, PERs and CRYs that comprise the core clock (13). These transcript level oscillations form regulatory feedback loops that oscillate throughout the transcriptome (14–15,2). Moreover, a large number of metabolites and proteins in cells exhibit circadian oscillations and may play a key role within the organization of genetic circadian regulation (16–19). Strikingly, the circadian landscape in a cell can be drastically different depending on genetic and epigenetic conditions (17,12,2,20). The process by which these circadian landscapes evolve is understood as circadian reprogramming. Reprogramming can be induced by external perturbations such as inflammation or dietary challenge (21–24). The large repository of omic data provided in CircadiOmics, together with several comparative analysis tools, provide a foundational platform that can be used to analyze these complex mechanisms and their implications.

MATERIALS AND METHODS

Dataset collection

The omic datasets available on CircadiOmics are compiled from project collaborations, automated discovery and manual curation. Over 6400 individual time points spanning 227 separate circadian experiments are available for search and visualization. In aggregate, these datasets form the largest single repository of circadian data available, including all datasets from other repositories including CircaDB (25). Table 1 shows a break down of the number of datasets available on several other sources. Eight species are currently available on CircadiOmics. The majority are collected from Mus musculus and Papio anibus.

Table 1.

Data volumes of publicly available circadian omic databases

Source	Experiments	Tissues	Species	Total data pts. (est.)
CircadiOmics	227	23	8	≈74 600 000
CircaDB	30	15	2	<1 800 000
DIURNAL	11	3	3	≈3 009 600
BIOCLOCK	2	2	2	≈3 600 000
CirGRDB	50	<20	2	≈9 000 000

Comparison of CircadiOmics with other circadian repositories. Experiments refers to the total number of experimental level datasets from each source. An experimental level dataset should contain at least two time points, more than one replicate at each time point, and time series data for a substantial number of molecular species–at least 1000 for transcriptome and acetylome, and at least 100 for metabolome and proteome–and each replicate. Total data points provide an estimate of the total number of individual measurements taken across different time points, replicates and molecular species. Numbers are collected from internal statistics for CircadiOmics and from publications, or official websites, for the other sources. Details are provided in Supplementary Material. Over 62 tissues grouped into 18 categories are represented in the database. Within these categories, liver and brain experiments comprise the majority. Diverse experimental conditions grouped into nine broad categories are available for comparison. Unique conditions include chronic and acute ethanol consumption, high-fat diet, traumatic brain injury, fibroblast undergoing myogenic reprogramming and several cancer-specific datasets (26,27). At last, CircadiOmics is the only tool that includes transcriptome, metabolome, acetylome and proteome experiments. Figure 1 summarizes the number of available datasets by detailed categories. The full table of datasets is available, with a short description and experimental details such as number of replicates, on the CircadiOmics web portal.

Figure 1.

Dataset collection by species, tissues, experimental conditions and omic categories.

Dataset collection by species, tissues, experimental conditions and omic categories. Increased interest in circadian rhythms is driving a continuous increase in publicly available omic datasets. Automated discovery of datasets has become necessary to maintain the most current and comprehensive repository. A Python framework built with scholarly and geotools Python packages is used to continuously search the literature for new circadian omic studies and datasets. Automated discovery based on keyword searches in published abstracts is filtered using several features including publishing journal, author and provided supplementary materials. A logistic regression step is used to classify datasets that are good candidates for inclusion in CircadiOmics. Results produced by this automated pipeline are then manually inspected for quality, based primarily on the time point resolution of the dataset. The minimum sampling density for any dataset in the repository is every eight hours over a 24-h cycle. Additionally, the CircadiOmics team and collaborating biologists periodically search recent publications for new datasets that qualify for inclusion in CircadiOmics.

Statistics

All datasets are processed with both BIO_CYCLE and JTK_CYCLE to provide oscillation statistics (e.g. period, amplitude, phase) for each set of samples (28,29). Primary identification of oscillatory species is made using p-values and accompanying q-values at a selected threshold. Technical details for calculating P-values and q-values are provided in the cited articles for the respective methods. BIO_CYCLE results have consistently shown to be an improvement in determining periodicity over older methods (28). The BIO_CYCLE portal within CircadiOmics at http://circadiomics.ics.uci.edu/biocycle allows users to upload an unpublished dataset for processing with BIO_CYCLE. For each experiment and each molecular species, individual P-value, q-value, period, amplitude and phase can be obtained. Additionally, summary figures are generated for the distribution of each statistic in the user provided dataset. Trends for individual trajectories in user-provided data are available for search and visualization through the supplied set of molecular IDs. An example dataset is provided to give the user a sample of portal features and provide a template for desired data format. The main CircadiOmics documentation page provides additional guidance. The BIO_CYCLE R package is also available for download through the main portal.

Implementation

CircadiOmics is available as a pubic domain website at http://circadiomics.ics.uci.edu. The CircadiOmics web application is constructed as a three-tier Model View Controller architecture. The web server is implemented with the Flask Python library. The interface is generated dynamically with Twitter Bootstrap and Google Charts. Fast query response times are accomplished by caching JSON serialized datasets on disk as the server is started. Figure 2 describes the web application architecture and corresponding technology. The interface loads with an example search of ARNTL (CLOCK-BMAL) in a sample liver control dataset. Dynamic filtering of the available datasets is provided based on tissue and experimental perturbations. Examples of filtering options are provided in the documentation on the main web server in the context of various sample workflows. Downloadable results for each search include high resolution images in PNG or SVG format, and an excel table of BIO_CYCLE reported statistics. Dataset documentation includes a short technical description as well as a link to the corresponding article in PubMed. At last, additional help information on the features of CircadiOmics is provided through a link on the main page of the web server.

Figure 2.

Three-tier Model-View-Controller architecture of the CircadiOmics web portal. Intelligent data discovery supplies candidate datasets for inclusion in the repository using a machine learning filter applied to key word features derived from web crawling published abstracts. BIO_CYCLE results are obtained and stored for all datasets. The user interface sends requests and displays results from the web server allowing for interactive hypothesis generation and scientific discovery.

RESULTS

Features

The main functionality of CircadiOmics is the search, comparison and visualization of oscillation trends. The user can search any molecular species in the omic datasets within the repository and overlay multiple searches together to initiate a comparative study. A typical work flow may consist of comparing a set of specific transcripts, metabolites or proteins among several datasets. Intelligent auto-completion facilitates user queries within the currently selected dataset. Searches can be performed individually or in batch on a selected dataset. When datasets do not have the same time course, results are displayed from the minimum to the maximum time point over all selected datasets. Query result for a set of example searches is shown in Figure 3. Documentation available on the web server illustrates common query tasks and results. Datasets with large difference in intensity values at each time point can be dynamically scaled for easy visual comparison. Minimum and maximum values are normalized to zero and one, respectively.

Figure 3.

Visualization of queries for ARNTL, PER1 and CRY1 in a control mouse dataset. Any number of queries, across any number of datasets, can be displayed simultaneously.

Visualization of queries for ARNTL, PER1 and CRY1 in a control mouse dataset. Any number of queries, across any number of datasets, can be displayed simultaneously. A table of statistics is compiled and displayed beneath the main search window after each query. Statistics can be updated dynamically to reflect results obtained with BIO_CYCLE. The table can be downloaded in several formats compatible with Excel. Individual searches can be removed from both the search view and the statistics table. Figure 3 shows an example result obtained from searches for ARNTL, PER1 and CRY1 in an example dataset. With a rapidly expanding dataset collection, filtering candidate dataset within the interface has become necessary. The filtering menu allows the user to limit the scope of datasets displayed under drop-down menus for each dataset type. Filtering can be done by species, tissues and experimental conditions. Similar experimental conditions are categorically grouped together in the filtering menu. These include knock-downs, knock-outs, diet changes and drug treatments. The full set of available conditions for filtering is summarized in Figure 1. The search interface uses an abbreviated dataset identification. Upon selection of a dataset, the user can quickly verify the source of the data through a corresponding literature citation. Additional details for each dataset can be found in tabular form under the dataset tab. These details include a brief description of the experimental protocol. The Metabolic Atlas web portal (http://circadiomics.ics.uci.edu/metabolicatlas) is also available under the CircadiOmics umbrella. In addition to metabolite time series, interactive metabolic networks can be generated and visualized. These networks are derived in part from the KEGG database (30) and can be filtered using BIO_CYCLE statistics.

Improvements

The new version of CircadiOmics considerably increases the amount of data available to the user. In particular, the number of experiment-level datasets increased from 50 to 227, the number of species increased from 1 to 8, the number of transcriptomic datasets increased from 40 to 169, the number of proteomic datasets increased from 1 to 8, the number of acetylome datasets increased from 1 to 8 and the number of metabolomic datasets increased from 5 to 32. Beyond the multi-fold increase in the underlying data repository, the new version of CircadiOmics comes with several other significant improvements, including a new, more robust, architecture and software infrastructure. In addition, all circadian statistics are computed using the latest version of BIO_CYCLE with the capability to systematically apply any updates on the fly, as new versions of BIO_CYCLE are created and released. Thus, together with intelligent data discovery, CircadiOmics provides state-of-the-art statistical tools for integrating and analyzing circadian data. The server-side code has improved security through encrypted HTTPS connection and enabled user-specific content visibility for unpublished data. In combination, the new features enable CircadiOmics users to conduct end-to-end circadian analyses, starting from the generation of new hypotheses all the way to the generation of results suitable for publication.

DISCUSSION

Central to the study of circadian rhythms are large-scale reprogramming events. Understanding these events at the molecular level critically depends on being able to access and compare significant amounts of high-throughput circadian omic data. CircadiOmics, with its advanced search features and unprecedented amount of high quality circadian data, is a primary enabling tool for such studies. In a circadian reprogramming event, changes in oscillation of one molecular species can often be related to changes in other molecular species (31,2). One of the main qualities of CircadiOmics is the flexibility of the comparative analyses it enables. For instance, a user can compare transcripts across species, or relate metabolites to proteins and transcripts and identify underlying oscillatory trends. An important example can be seen in the loss of oscillation in the metabolite NAD+ as a response to changes in the transcriptomic oscillatory landscape (17). As a result, CircadiOmics has proven to be highly effective for hypothesis generation in new studies. To date, the web server has contributed to multiple studies that have been published in high impact journals. The server has been accessed more than 250 000 times in total traffic in 2017 alone. Figure 4 details some examples of the impact of CircadiOmics. For instance, Eckel-Mahan et al. utilized CircadiOmics to analyze three related omic datasets in mouse liver (17). They found that core clock genes regulate the acetylation of the enzyme AceCS1. AceCS1 is responsible for changes in the oscillation of the metabolite acetyl-CoA, a key metabolite involved in fatty acid synthesis (Figure 4 A). Similarly, Masri et al. compared liver transcriptomic data with metabolomic data in mice afflicted with cancer using CircadiOmics (Figure 4 B). They discovered that a distal tumor-bearing lung can reprogram the liver circadian transcriptome through inflammatory pathways and insulin related metabolic pathways (27). More recently, CircadiOmics has been used to examine the role of circadian regulation in myogenic reprogramming of fibroblast (https://www.biorxiv.org/content/early/2017/06/18/151555). It was observed that the core clock is completely disrupted during this process. However, exogenous MYOD1 gains rhythmicity during transition to muscle cell. As a result, MYOG and a majority of critical transcription factors related to muscle development known to be regulated by MYOD1 synchronize oscillation. This behavior was identified in CircadiOmics through visualization and confirmed by BIO_CYCLE reported phase lag (Figure 4 C). At last, aggregating all mouse transcriptomic datasets confirms and amplifies the notion that circadian oscillations are pervasiveness: 93.5% of all possible protein coding transcripts exhibit circadian oscillations in at least one tissue or experiment (up from about 67% in (2)) (Figure 4 D). The large number of datasets in CircadiOmics facilitates these kinds of integrative analyses. Additional analysis of the 1275 protein coding transcripts that are not found to oscillate in any condition or tissue is provided in Supplementary Table S2.

Figure 4.

Selected examples of the impact of CircadiOmics. (A) CircadiOmics was used to link a multitude of circadian metabolites with functionally related circadian transcripts. Figure taken from Figure 5A of (17). (B) CircadiOmics was used to discover reprogrammed circadian transcripts and metabolites related to inflammatory and energy pathways. Figure taken from Figures 2E, 4B and 5D of (27). (C) Exogenous MYOD1, during MEF myogenic reprogramming, entrains oscillation in MYOG and related targets in absence of oscillation of the core clock (https://www.biorxiv.org/content/early/2017/06/18/151555). (D) Bar heights show the ordered number of oscillating protein coding transcripts with a P ≤ 0.05 in each mouse transcriptomic experiment in the repository. The trend is the cumulative union of oscillating transcripts. Over 93% of possible protein coding transcripts are found to oscillate in at least one tissue or condition across all mouse datasets. The latest release of CircadiOmics is the largest single repository of circadian omic data available. Updates in server architecture and data mining ensure that CircadiOmics will continue to maintain and grow as new data is published. Improvement in features for search and visualization expand the possibilities for study of circadian rhythms in omic datasets. These possibilities include generating specific hypothesis for individual experiments and answering larger questions about the organization of oscillation within a cell. Click here for additional data file.

30 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets.

Authors: Michael E Hughes; John B Hogenesch; Karl Kornacker
Journal: J Biol Rhythms Date: 2010-10 Impact factor: 3.182

3. The pervasiveness and plasticity of circadian oscillations: the coupled circadian-oscillators framework.

Authors: Vishal R Patel; Nicholas Ceglia; Michael Zeller; Kristin Eckel-Mahan; Paolo Sassone-Corsi; Pierre Baldi
Journal: Bioinformatics Date: 2015-06-06 Impact factor: 6.937

Review 4. Circadian topology of metabolism.

Authors: Joseph Bass
Journal: Nature Date: 2012-11-15 Impact factor: 49.962

5. Cancer inhibition through circadian reprogramming of tumor transcriptome with meal timing.

Authors: Xiao-Mei Li; Franck Delaunay; Sandrine Dulong; Bruno Claustrat; Sinisa Zampera; Yoshiro Fujii; Michèle Teboul; Jacques Beau; Francis Lévi
Journal: Cancer Res Date: 2010-04-15 Impact factor: 12.701

Review 6. Clocks, metabolism, and the epigenome.

Authors: Dan Feng; Mitchell A Lazar
Journal: Mol Cell Date: 2012-07-27 Impact factor: 17.970

7. Circadian acetylome reveals regulation of mitochondrial metabolic pathways.

Authors: Selma Masri; Vishal R Patel; Kristin L Eckel-Mahan; Shahaf Peleg; Ignasi Forne; Andreas G Ladurner; Pierre Baldi; Axel Imhof; Paolo Sassone-Corsi
Journal: Proc Natl Acad Sci U S A Date: 2013-01-22 Impact factor: 11.205

Review 8. The Circadian Clock and Human Health.

Authors: Till Roenneberg; Martha Merrow
Journal: Curr Biol Date: 2016-05-23 Impact factor: 10.834

9. Transcriptional architecture and chromatin landscape of the core circadian clock in mammals.

Authors: Nobuya Koike; Seung-Hee Yoo; Hung-Chung Huang; Vivek Kumar; Choogon Lee; Tae-Kyung Kim; Joseph S Takahashi
Journal: Science Date: 2012-08-30 Impact factor: 47.728

Review 10. When brain clocks lose track of time: cause or consequence of neuropsychiatric disorders.

Authors: Jerome S Menet; Michael Rosbash
Journal: Curr Opin Neurobiol Date: 2011-07-05 Impact factor: 6.627

10 in total

Review 1. The hidden link between circadian entropy and mental health disorders.

Authors: Amal Alachkar; Justine Lee; Kalyani Asthana; Roudabeh Vakil Monfared; Jiaqi Chen; Sammy Alhassen; Muntaha Samad; Marcelo Wood; Emeran A Mayer; Pierre Baldi
Journal: Transl Psychiatry Date: 2022-07-14 Impact factor: 7.989

2. CircadiOmics: circadian omic web portal.

Authors: Muntaha Samad; Forest Agostinelli; Tomoki Sato; Kohei Shimaji; Pierre Baldi
Journal: Nucleic Acids Res Date: 2022-06-03 Impact factor: 19.160

3. Integration of feeding behavior by the liver circadian clock reveals network dependency of metabolic rhythms.

Authors: Carolina M Greco; Kevin B Koronowski; Jacob G Smith; Jiejun Shi; Paolo Kunderfranco; Roberta Carriero; Siwei Chen; Muntaha Samad; Patrick-Simon Welz; Valentina M Zinna; Thomas Mortimer; Sung Kook Chun; Kohei Shimaji; Tomoki Sato; Paul Petrus; Arun Kumar; Mireia Vaca-Dempere; Oleg Deryagin; Cassandra Van; José Manuel Monroy Kuhn; Dominik Lutter; Marcus M Seldin; Selma Masri; Wei Li; Pierre Baldi; Kenneth A Dyar; Pura Muñoz-Cánoves; Salvador Aznar Benitah; Paolo Sassone-Corsi
Journal: Sci Adv Date: 2021-09-22 Impact factor: 14.957

Review 4. Circadian blueprint of metabolic pathways in the brain.

Authors: Carolina Magdalen Greco; Paolo Sassone-Corsi
Journal: Nat Rev Neurosci Date: 2019-02 Impact factor: 34.870

5. Rhythmic Component Analysis Tool (RCAT): A Precise, Efficient and User-Friendly Tool for Circadian Clock Genes Analysis.

Authors: Zhibo Liu; Meng Meng; Shufan Zhang; Hao Qiu; Zhiwei Liu; Moli Huang
Journal: Interdiscip Sci Date: 2021-08-09 Impact factor: 2.233

6. S-adenosyl-l-homocysteine hydrolase links methionine metabolism to the circadian clock and chromatin remodeling.

Authors: Carolina Magdalen Greco; Marlene Cervantes; Jean-Michel Fustin; Kakeru Ito; Nicholas Ceglia; Muntaha Samad; Jiejun Shi; Kevin Brian Koronowski; Ignasi Forne; Suman Ranjit; Jonathan Gaucher; Kenichiro Kinouchi; Rika Kojima; Enrico Gratton; Wei Li; Pierre Baldi; Axel Imhof; Hitoshi Okamura; Paolo Sassone-Corsi
Journal: Sci Adv Date: 2020-12-16 Impact factor: 14.136

7. The central clock suffices to drive the majority of circulatory metabolic rhythms.

Authors: Paul Petrus; Jacob G Smith; Kevin B Koronowski; Siwei Chen; Tomoki Sato; Carolina M Greco; Thomas Mortimer; Patrick-Simon Welz; Valentina M Zinna; Kohei Shimaji; Marlene Cervantes; Daniela Punzo; Pierre Baldi; Pura Muñoz-Cánoves; Paolo Sassone-Corsi; Salvador Aznar Benitah
Journal: Sci Adv Date: 2022-06-29 Impact factor: 14.957