Literature DB >> 35657089

CircadiOmics: circadian omic web portal.

Muntaha Samad1,2, Forest Agostinelli3, Tomoki Sato4, Kohei Shimaji4, Pierre Baldi1,2.   

Abstract

Circadian rhythms are a foundational aspect of biology. These rhythms are found at the molecular level in every cell of every living organism and they play a fundamental role in homeostasis and a variety of physiological processes. As a result, biomedical research of circadian rhythms continues to expand at a rapid pace. To support this research, CircadiOmics (http://circadiomics.igb.uci.edu/) is the largest annotated repository and analytic web server for high-throughput omic (e.g. transcriptomic, metabolomic, proteomic) circadian time series experimental data. CircadiOmics contains over 290 experiments and over 100 million individual measurements, across >20 unique tissues/organs, and 11 different species. Users are able to visualize and mine these datasets by deriving and comparing periodicity statistics for oscillating molecular species including: period, amplitude, phase, P-value and q-value. These statistics are obtained from BIO_CYCLE and JTK_CYCLE and are intuitively aggregated and displayed for comparison. CircadiOmics is the most up-to-date and cutting-edge web portal for searching and analyzing circadian omic data and is used by researchers around the world.
© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Year:  2022        PMID: 35657089      PMCID: PMC9252794          DOI: 10.1093/nar/gkac419

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   19.160


INTRODUCTION

Circadian rhythms are found in plants, animals, fungi and cyanobacteria and are fundamental to biology (1–4). They date back to the first cyanobacteria and the origin of life on earth and, since then, through ∼2 trillion revolutions of the earth on its axis, they have been deeply etched in the molecular machinery of all cells. Disruptions of circadian rhythms have been linked to health problems such as cancer, diabetes, obesity, and premature aging (2,5–12). The advance of modern high-throughput technologies has made it possible to investigate circadian rhythms at the molecular level. Measuring the concentrations of molecular species across time has shown that circadian oscillations are pervasive in all living cells (4,13,14). Circadian oscillations are generated by feedback loops which are regulated in part by the ‘core clock’ (15). The core clock is a classical inhibitory transcription-translation feedback loop which is highly conserved from animals to plants. While the core clock comprises a dozen genes including CLOCK, BMAL1, PER1, PER2, CRY1 and CRY2, in any given experiment ∼10% of all transcripts and metabolites display circadian oscillations (16–22). However, the complement of molecular species exhibiting circadian oscillations greatly varies with genetic, epigenetic, tissue/organ, health, age, and environmental conditions. This circadian reprogramming is a major target of active investigations aimed at understanding how environmental conditions, such as drug treatments or diets, affect circadian oscillations and how oscillations in different cells/tissues are coordinated and interact with each other (19,23–30). The large repository of omic data available via CircadiOmics, serves as an invaluable resource to analyze the complexity of circadian mechanisms and their downstream implications. The CircadiOmics interface is especially advantageous and unique because it allows users to easily perform comparative analyses and aggregated inferences about circadian rhythms at the molecular level across species, tissues/organs, and genetic, epigenetic and environmental conditions.

MATERIALS AND METHODS

Datasets

CircadiOmics currently contains over 290 omic datasets with over 100 million individual measurements, across >20 unique tissues/organs, and 11 different species. For simplicity we group the unique tissue/organ types into 13 categories: liver, brain, digestive, skin, serum, muscle, adipose, glands, cells, kidney, heart, eye and other. Figure 1B shows a breakdown of the # of datasets contained within the repository for each tissue/organ category. The species currently represented in CircadiOmics include: Aedes aegypti, Anopheles gambiae, Arabidopsis thaliana, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Neurospora crassa, Papio anubis, Rattus norvegicus, and Rhesus macaques. Figure 1A shows a breakdown of the number of datasets per species. As such, CircadiOmics is the most extensive, comprehensive, and current repository for circadian data. For comparison purposes, Table 1 shows a breakdown of the number and types of datasets currently available in the most prominent circadian data repositories (31–33) and [Li, D., Yang, R., Miao, Z., and Tao, W. (2010) BioClock: a web server and database aimed for interpreting circadian rhythm. Intelligent Systems for Molecular Biology 2010 meeting]. The majority of datasets in CircadiOmics are collected from the species Mus musculus (mouse) and Papio anubis (baboon) and from liver and brain tissues. In addition to a wide variety of species and tissues, CircadiOmics also has a diverse set of experimental conditions represented. Some of the experimental conditions represented include: knock-downs, knock-outs, diet changes, exercise, and drug treatments. In addition, CircadiOmics uniquely contains data from different omic experiments, including transcriptome, metabolome, proteome, and acetylome experiments. Figure 1 summarizes the number of available datasets by detailed categories. The full table summarizing all of the datasets is available on the CircadiOmics web portal with a short explanation of the dataset, a brief description of the experimental protocol, the citation, the GEO accession number, and other summary information.
Figure 1.

Breakdown of datasets by species, tissue, experimental conditions and omic categories.

Table 1.

Comparison of CircadiOmics with other circadian web servers

SourceDatasetsTissuesSpecies
CircadiOmics 2992511
CircaDB 43152
BIOCLOCK 222
CirGRDB 99<202
Breakdown of datasets by species, tissue, experimental conditions and omic categories. Comparison of CircadiOmics with other circadian web servers

Dataset collection

The datasets in CircadiOmics are collected from research collaborations, automated discovery, and manual discovery. The two main automated approaches used to identify newly available circadian dataset are a web crawler developed in-house and the publicly available web service PubCrawler (34). The web crawler developed in-house uses the Python packages scholarly and geotools to search the literature to discover new circadian omic studies and their affiliated datasets. To find new datasets, the crawler performs keyword searches on published abstracts, extracts various features from the published articles, and then uses logistic regression on the extracted features to classify whether or not a dataset is a good candidate for inclusion in CircadiOmics. The datasets discovered by the crawler are then manually vetted and processed to be included into CircadiOmics. Using this crawler in tandem with PubCrawler, which sends a daily email containing a list of possible publications of interest, we are able to keep CircadiOmics current by continuously adding the latest cutting-edge research in circadian rhythms to the repository. Additionally, the CircadiOmics team and collaborating biologists include datasets from collaborative research projects and perform periodic manual searches on recent publications to further complement the data obtained through the automated discovery tools.

RESULTS

Features

The main focus of CircadiOmics is the search function, which allows users to compare and visualize the oscillation trends of molecular species. The user can select a single dataset, or multiple datasets, from within the repository and search for any molecular species. CircadiOmics allows for the overlay of multiple searches together to enable comparative studies and normalizes the output for easy visual comparison. For each query, a table of periodicity statistics including: period, amplitude, phase, P-value and q-value is displayed. These statistics are calculated using BIO_CYCLE and JTK_CYCLE (35,36). Molecular species are determined to have circadian oscillations by using P-values and accompanying q-values at a user selected threshold. In addition to selecting whether to view statistics from BIO_CYCLE or JTK_CYCLE, users can also filter datasets based on species, tissue/organ and experimental conditions. In addition to the search functionality, to assist with the analysis of circadian experiments we have created the BIO_CYCLE web server: http://circadiomics.igb.uci.edu/biocycle. The web server runs the latest version of the BIO_CYCLE software on user-uploaded omic time-series datasets and provides the user with easy-to-use analysis tools which include: histograms of periods, phases, amplitudes, and offsets, querying of molecular species based on a user-selected P-value or q-value cutoff, visualization of molecular concentrations across time, and analysis at 24, 12 and 8 h periods. To use the web server, users must upload a file containing the measurements related to the concentrations of molecular species across time points (e.g. transcript levels measured every 4 h). Each row must contain the ID of the molecular species, followed by the concentration measurement at each time point. Each column corresponds to a different time point or replicate. After the file is uploaded, the server will run BIO_CYCLE on the uploaded file for three separate period ranges: 20 through 28 h, 10 through 14 h and 7 through 9 h. A separate deep neural network (DNN) is trained for each set of timepoints and for each period range. If the DNNs are already trained, then the results should be ready within about 1 min. If the DNNs are not trained, BIO_CYCLE will automatically train them and the results will be ready within about 2 min. The user can then visualize the results using various drop-down menus to select the period of interest and P-value and q-value cutoffs. As shown in Figure 2, given the selected period to investigate, and a P-value or q-value threshold, the web server will produce histograms of periods, lags, amplitudes and offsets.
Figure 2.

The BIO_CYCLE web server interface.

The BIO_CYCLE web server interface. Another feature provided by CircadiOmics is the The Metabolic Atlas web portal. The Metabolic Atlas web portal (http://circadiomics.ics.uci.edu/metabolicatlas), allows researchers to generate and visualize interactive metabolic networks. These networks are derived from the KEGG database and can be filtered using BIO_CYCLE statistics (37). To create a metabolic network, users start by selecting a dataset and a particular metabolite. From there, the user can select options to create a network. For example, one option is to display a network of all metabolites that are oscillating in-phase with the selected metabolite. Another possible option is to display a network of all metabolites that are involved in the same pathways as the selected metabolite. There are six possible options for the user to select from for the network creation. Once the network is displayed, the user can choose to filter out edges based on BIO_CYCLE statistics.

Improvements

Since its last publication, CircadiOmics has undergone substantial improvements including a significant increase in the number of datasets and the diversity of datasets available to its users. The number of available queryable datasets has increased from 227 to over 290, and the number of species and experimental conditions included in the web server has also increased. In addition to the significant increase in available data, the latest version of CircadiOmics and the corresponding automated data discovery pipeline have received several improvements. To optimize the automated dataset discovery process, we have started utilizing open source web-crawlers in addition to our web crawler developed in-house to make sure we are capturing as many relevant datasets as possible, and we have improved our in-house web crawler by broadening the keyword searches performed on published abstracts to allow us to discover more species and tissue types that had not previously been represented in CircadiOmics. This improvement to the web crawler is what allowed us to find circadian experiments performed on Drosophila melanogaster, Danio rerio, and Neurospora crassa, species not previously included in CircadiOmics. In addition, the latest version of BIO_CYCLE, which has undergone significant improvement, has been made available via the CircadiOmics web portal. The improvements to BIO_CYCLE include: implementation in Python to take advantage of state-of-the-art deep learning software, ability to handle missing timepoints, improved amplitude estimation, the addition of offset estimation, and modeling real-world replicated experimental data to produce more realistic p-values. The previous version of BIO_CYCLE was implemented in R, which does not have convenient access to deep learning libraries that allow users to utilize Graphics Processing Units (GPUs) to increase the speed of training and testing DNNs. In the previous version, we were restricted to training on slower Central Processing Units (CPUs). As a result, we only trained a small three-layer network with 100 hidden units per layer. Since the new version of BIO_CYCLE is implemented in Python, we take advantage of the PyTorch deep learning library to train significantly larger DNNs on GPUs (38). The increased size of the DNNs substantially helps in handling missing data (e.g. missing replicates). The latest BIO_CYCLE also utilizes real-world experimental data available via CircadiOmics not only to evaluate performance and fairly compare the new BIO_CYCLE algorithm to other available algorithms, but also to better fine tune the algorithm and make P-value estimations more accurate. In combination, these new features allow researchers to perform end-to-end circadian analyses of their data and to compare and combine their data with other available datasets.

Applications

CircadiOmics has numerous and diverse applications. To name just a few: users can analyze a single dataset, analyze multiple datasets of the same omic type across different tissues or species, and analyze relationships between datasets of different omic types. This flexibility to perform comparative analyses has proven to be highly effective for biological discovery and hypothesis generation and as such has contributed to numerous studies that have been published in high impact journals (39–54). For example, in Koronowski et al. CircadiOmics was used to better understand the independence of the liver circadian clock. Using high-throughput transcriptomic and metabolomic data, they showed that the liver has independent circadian functions specific for metabolic processes, however full circadian function in the liver depends on signals from other clocks (55). In Tognini et al. CircadiOmics was used to analyze metabolomic data in the suprachiasmatic nucleus (SCN) under various experimental conditions to discover a sensitivity of brain clocks to nutrition (56). Finally, in Masri et al. CircadiOmics contributed to showing that lung cancer has no effect on the core clock but rather specifically reprograms hepatic metabolism, proving that a pathological condition in a given tissue can influence the circadian homeostasis in other tissues (57). Additionally, we performed our own analysis using the data available via CircadiOmics in aggregate to better understand the overall hierarchical architecture of transcriptomic circadian regulation. To perform this analysis we looked at the frequency at which important regulators, such as transcription factors (TFs) or RNA-binding proteins (RBPs) are found to oscillate across all mouse and baboon datasets to quantify their importance in circadian regulation. The top oscillating TFs and RBPs in mice and baboons can be seen in Figure 3. We found that the circadian core clock appears with the highest frequency and is closely followed by TFs and RBPs with known interactions to the core clock. Aside from the core clock, this analysis identified multiple TFs and RBPs important to circadian regulation, some of which are corroborated by evidence in the literature and others which are novel. We were able to validate some of the novel findings with animal experiments. For example, the TFs FUS and EIF4B were identified in our analyses as having the potential for being strong circadian regulators. Consistent with this result, Figure 4 shows that the reverse transcription-quantitative PCR (RT-qPCR) and western blot analyses showed that mRNA and protein levels of both of these genes have diurnal rhythm in the liver in certain experiments and FUS was also shown to have a diurnal rhythmicity in the SCN (Supplementary Materials). In addition, the TF MXI1 was identified in our analysis as a novel circadian factor and RT-qPCR analyses were able to detect rhythmic gene expressions of MXI1 in mice livers (Figure S4, Supplementary Materials). This analysis was repeated on a few other TFs to validate the computational experimental findings (Supplementary Figure S1). In short, in vivo experiments confirmed the circadian expression of important genes predicted in computo by analyzing the data in CircadiOmics. Together, these findings show that CircadiOmics provides a strong foundation for understanding the organization of the circadian transcriptome on a large scale.
Figure 3.

Frequency analysis rediscovers core clock as well as a few novel circadian regulatory TFs and RBPs. Highlighted genes are those validated in the in vivo experiments found in Figure 4.

Figure 4.

Validation of computational analysis results by in vivo experiments. Wild type (WT) mice samples were obtained under ad lib conditions. (A) RT-qPCR were used to determine expression of novel circadian factors detected by computational analysis in the mouse liver. The results are displayed as percent increase/decrease, from the level of mRNA expressed in the mice at ZT 0. (B) Daily rhythms in protein expression of EIF4B in the whole cell lysate from the liver (n = 2). Representative image of immunoblot analysis of EIF4B are shown. Line graph shows quantification from EIF4B normalized to α-tubulin. Values are expressed as a percentage of the value for ZT 0. (C) Chromatin recruitment of BMAL1 at the E-box motif contained in the EIF4B promoter. ChIP-qPCR assays were done utilizing dual cross-linked livers at ZT 8 and 20 with antibodies against BMAL1 (n = 3 at ZT 8, n = 2 at ZT 20). *P < 0.05 in Student's t test. (D) RT-qPCR was used to determine mRNA expression of the novel circadian factors detected by computational analysis in the liver (n = 5). The results are displayed as percent increase/decrease, from the level of mRNA expressed in the mice at ZT 0. (E) RT-qPCR was used to determine mRNA expression of novel circadian factors detected by computational analysis in the SCN (n = 2 at ZT 0, n = 3 at ZT 4, 8, 12, 16, 20). The results are displayed as percent increase/decrease, from the level of mRNA expressed in the mice at ZT 0.

Frequency analysis rediscovers core clock as well as a few novel circadian regulatory TFs and RBPs. Highlighted genes are those validated in the in vivo experiments found in Figure 4. Validation of computational analysis results by in vivo experiments. Wild type (WT) mice samples were obtained under ad lib conditions. (A) RT-qPCR were used to determine expression of novel circadian factors detected by computational analysis in the mouse liver. The results are displayed as percent increase/decrease, from the level of mRNA expressed in the mice at ZT 0. (B) Daily rhythms in protein expression of EIF4B in the whole cell lysate from the liver (n = 2). Representative image of immunoblot analysis of EIF4B are shown. Line graph shows quantification from EIF4B normalized to α-tubulin. Values are expressed as a percentage of the value for ZT 0. (C) Chromatin recruitment of BMAL1 at the E-box motif contained in the EIF4B promoter. ChIP-qPCR assays were done utilizing dual cross-linked livers at ZT 8 and 20 with antibodies against BMAL1 (n = 3 at ZT 8, n = 2 at ZT 20). *P < 0.05 in Student's t test. (D) RT-qPCR was used to determine mRNA expression of the novel circadian factors detected by computational analysis in the liver (n = 5). The results are displayed as percent increase/decrease, from the level of mRNA expressed in the mice at ZT 0. (E) RT-qPCR was used to determine mRNA expression of novel circadian factors detected by computational analysis in the SCN (n = 2 at ZT 0, n = 3 at ZT 4, 8, 12, 16, 20). The results are displayed as percent increase/decrease, from the level of mRNA expressed in the mice at ZT 0.

CONCLUSION

CircadiOmics allows users to seamlessly compare and analyze multiple omic time-series data sets simultaneously. For example, a user can compare transcripts across species or tissues, or map out relationships between metabolites, proteins, and transcripts to identify underlying oscillatory trends. CircadiOmics has proven to be highly effective for performing end-to-end circadian analyses from hypothesis generation to publication-ready figures creation. This web server has contributed to numerous studies that have been published in high impact journals and in aggregate has been cited in over 190 publications. The server receives approximately 1,000 queries per week from around the world and to the best of our knowledge is the largest single repository of circadian omic data available. With the quantity and breadth of its growing, high-quality, circadian omic data, CircadiOmics continues to be an invaluable resource for understanding the fundamental landscape of circadian rhythms and how these rhythms are programmed, and can be re-programmed, in cells, tissues, organs, and organisms with significant implications for medicine and therapeutic interventions. Click here for additional data file.
  56 in total

1.  PubCrawler: keeping up comfortably with PubMed and GenBank.

Authors:  Karsten Hokamp; Kenneth H Wolfe
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

Review 2.  Obesity and shift work: chronobiological aspects.

Authors:  L C Antunes; R Levandovski; G Dantas; W Caumo; M P Hidalgo
Journal:  Nutr Res Rev       Date:  2010-02-02       Impact factor: 7.800

3.  Time of Exercise Specifies the Impact on Muscle Metabolic Pathways and Systemic Energy Homeostasis.

Authors:  Shogo Sato; Astrid Linde Basse; Milena Schönke; Siwei Chen; Muntaha Samad; Ali Altıntaş; Rhianna C Laker; Emilie Dalbram; Romain Barrès; Pierre Baldi; Jonas T Treebak; Juleen R Zierath; Paolo Sassone-Corsi
Journal:  Cell Metab       Date:  2019-04-18       Impact factor: 27.287

4.  Circadian behavior is light-reprogrammed by plastic DNA methylation.

Authors:  Abdelhalim Azzi; Robert Dallmann; Alison Casserly; Hubert Rehrauer; Andrea Patrignani; Bert Maier; Achim Kramer; Steven A Brown
Journal:  Nat Neurosci       Date:  2014-02-16       Impact factor: 24.884

5.  Lung Adenocarcinoma Distally Rewires Hepatic Circadian Homeostasis.

Authors:  Selma Masri; Thales Papagiannakopoulos; Kenichiro Kinouchi; Yu Liu; Marlene Cervantes; Pierre Baldi; Tyler Jacks; Paolo Sassone-Corsi
Journal:  Cell       Date:  2016-05-05       Impact factor: 41.582

6.  Atlas of exercise metabolism reveals time-dependent signatures of metabolic homeostasis.

Authors:  Shogo Sato; Kenneth A Dyar; Jonas T Treebak; Sara L Jepsen; Amy M Ehrlich; Stephen P Ashcroft; Kajetan Trost; Thomas Kunzke; Verena M Prade; Lewin Small; Astrid Linde Basse; Milena Schönke; Siwei Chen; Muntaha Samad; Pierre Baldi; Romain Barrès; Axel Walch; Thomas Moritz; Jens J Holst; Dominik Lutter; Juleen R Zierath; Paolo Sassone-Corsi
Journal:  Cell Metab       Date:  2022-01-13       Impact factor: 27.287

7.  Integration of feeding behavior by the liver circadian clock reveals network dependency of metabolic rhythms.

Authors:  Carolina M Greco; Kevin B Koronowski; Jacob G Smith; Jiejun Shi; Paolo Kunderfranco; Roberta Carriero; Siwei Chen; Muntaha Samad; Patrick-Simon Welz; Valentina M Zinna; Thomas Mortimer; Sung Kook Chun; Kohei Shimaji; Tomoki Sato; Paul Petrus; Arun Kumar; Mireia Vaca-Dempere; Oleg Deryagin; Cassandra Van; José Manuel Monroy Kuhn; Dominik Lutter; Marcus M Seldin; Selma Masri; Wei Li; Pierre Baldi; Kenneth A Dyar; Pura Muñoz-Cánoves; Salvador Aznar Benitah; Paolo Sassone-Corsi
Journal:  Sci Adv       Date:  2021-09-22       Impact factor: 14.957

Review 8.  Metabolism and circadian rhythms--implications for obesity.

Authors:  Oren Froy
Journal:  Endocr Rev       Date:  2009-10-23       Impact factor: 19.871

9.  Intergenerational trauma transmission is associated with brain metabotranscriptome remodeling and mitochondrial dysfunction.

Authors:  Sammy Alhassen; Siwei Chen; Lamees Alhassen; Alvin Phan; Mohammad Khoudari; Angele De Silva; Huda Barhoosh; Zitong Wang; Chelsea Parrocha; Emily Shapiro; Charity Henrich; Zicheng Wang; Leon Mutesa; Pierre Baldi; Geoffrey W Abbott; Amal Alachkar
Journal:  Commun Biol       Date:  2021-06-24

10.  Circadian rhythm reprogramming during lung inflammation.

Authors:  Jeffrey A Haspel; Sukrutha Chettimada; Rahamthulla S Shaik; Jen-Hwa Chu; Benjamin A Raby; Manuela Cernadas; Vincent Carey; Vanessa Process; G Matthew Hunninghake; Emeka Ifedigbo; James A Lederer; Joshua Englert; Ashley Pelton; Anna Coronata; Laura E Fredenburgh; Augustine M K Choi
Journal:  Nat Commun       Date:  2014-09-11       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.