| Literature DB >> 29098954 |
Michael E Hughes1, Katherine C Abruzzi2, Ravi Allada3, Ron Anafi4, Alaaddin Bulak Arpat5,6, Gad Asher7, Pierre Baldi8, Charissa de Bekker9, Deborah Bell-Pedersen10, Justin Blau11, Steve Brown12, M Fernanda Ceriani13, Zheng Chen14, Joanna C Chiu15, Juergen Cox16, Alexander M Crowell17, Jason P DeBruyne18, Derk-Jan Dijk19, Luciano DiTacchio20, Francis J Doyle21, Giles E Duffield22, Jay C Dunlap17, Kristin Eckel-Mahan23, Karyn A Esser24, Garret A FitzGerald25, Daniel B Forger26, Lauren J Francey27, Ying-Hui Fu28, Frédéric Gachon29, David Gatfield5, Paul de Goede30, Susan S Golden31, Carla Green32, John Harer33, Stacey Harmer34, Jeff Haspel1, Michael H Hastings35, Hanspeter Herzel36, Erik D Herzog37, Christy Hoffmann1, Christian Hong27, Jacob J Hughey38, Jennifer M Hurley39, Horacio O de la Iglesia40, Carl Johnson41, Steve A Kay42, Nobuya Koike43, Karl Kornacker44, Achim Kramer45, Katja Lamia46, Tanya Leise47, Scott A Lewis1, Jiajia Li1,48, Xiaodong Li49, Andrew C Liu50, Jennifer J Loros51, Tami A Martino52, Jerome S Menet10, Martha Merrow53, Andrew J Millar54, Todd Mockler55, Felix Naef56, Emi Nagoshi57, Michael N Nitabach58, Maria Olmedo59, Dmitri A Nusinow55, Louis J Ptáček60, David Rand61, Akhilesh B Reddy62, Maria S Robles53, Till Roenneberg53, Michael Rosbash2, Marc D Ruben27, Samuel S C Rund63, Aziz Sancar64, Paolo Sassone-Corsi65, Amita Sehgal66, Scott Sherrill-Mix67, Debra J Skene68, Kai-Florian Storch69, Joseph S Takahashi70, Hiroki R Ueda71, Han Wang72, Charles Weitz73, Pål O Westermark74, Herman Wijnen75, Ying Xu76, Gang Wu27, Seung-Hee Yoo14, Michael Young77, Eric Erquan Zhang78, Tomasz Zielinski54, John B Hogenesch27.
Abstract
Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding "big data" that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.Entities:
Keywords: ChIP-seq; RNA-seq; biostatistics; circadian rhythms; computational biology; diurnal rhythms; functional genomics; guidelines; metabolomics; proteomics; systems biology
Mesh:
Year: 2017 PMID: 29098954 PMCID: PMC5692188 DOI: 10.1177/0748730417728663
Source DB: PubMed Journal: J Biol Rhythms ISSN: 0748-7304 Impact factor: 3.182
Figure 1.The use of systems biology approaches has increased dramatically in the past 20 years. (A) Annual number of publications available on PubMed that contain the keywords “ChIP-seq,” “RNA-seq,” “Metabolomics,” “Proteomics,” and/or “Microarray.” These numbers were obtained directly from PubMed’s “Results by Year” section. (B) A Boolean search was used to filter the number of publications containing the chosen keyword combined with the term “circadian,” “clock,” or both. Both plots depict an increase in the use of functional genomics approaches in biology over the past 5 years, in particular the use of RNA-seq, ChIP-Seq, and metabolomics.

Figure 2.Duplicating and concatenating time-series data results in unacceptable false-positive rates. Duplicating and concatenating data to generate an artificially long time series eliminates statistical independence of samples. To empirically investigate the consequences of this manipulation, a randomly generated test set containing 1000 arrhythmic time series composed entirely of Gaussian noise was used to compare the effects of duplication and concatenation on the false-positive rate. The first simulated experiment had a duration of 48 h, with a sampling interval of 2 h. The second simulation was composed of every other time point from the first run, which resulted in a data set with a duration of 48 h and a sampling interval of 4 h. The third simulation was generated using the first half of the second run, which produced a data set with a duration of 24 h and a sampling interval of 4 h. JTK_Cycle was used to assess rhythmicity with a statistical threshold of adjusted p < 0.05 considered a “hit”. Without concatenation, each run produced conservative false-positive rates, with the number of hits less than 2% in every scenario. Adding the first concatenation increased the false-positive rate by a minimum of 8-fold. The second concatenation altered the initial false-positive rate by a minimum of 13-fold, and the third concatenation increased the false-positive rate by 18-fold compared with the initial rate.

Figure 3.CircaInSilico generates synthetic time series for benchmarking analytical pipelines. (A) To simulate unique circadian data sets, CircaInSilico (https://5c077.shinyapps.io/Circa_in_Silico/) allows users to define the duration of the experiment, number of transcripts, number of replicates, amplitude range, period length, and the percentage of rhythmic transcripts. (B) High-amplitude rhythmic time series simulated by CircaInSilico. The duration of the experiment was set to 48 h with no replication and a sampling interval of 4 h. The period length of the transcript was 24 h, and the amplitude range was set to −7 and 7 (arbitrary units). (C) Low-amplitude rhythmic time series simulated by CircaInSilico. The duration of the experiment was set to 48 h, with a sampling interval of 1 h. The period length was set to 24 h, with an amplitude range from −3 to 3 (arbitrary units). Each time point was replicated 3 times, and the trend line represents the average expression at every time point. (D) Arrhythmic time series simulated by CircaInSilico. The duration of the experiment was set to 48 h with no replication and a sampling interval of 2 h.