| Literature DB >> 30201705 |
Rosemary Braun1,2,3, William L Kath2,3,4, Marta Iwanaszko5,3, Elzbieta Kula-Eversole4, Sabra M Abbott6,7, Kathryn J Reid6,7, Phyllis C Zee4,6, Ravi Allada3,4.
Abstract
Circadian clocks play a key role in regulating a vast array of biological processes, with significant implications for human health. Accurate assessment of physiological time using transcriptional biomarkers found in human blood can significantly improve diagnosis of circadian disorders and optimize the delivery time of therapeutic treatments. To be useful, such a test must be accurate, minimally burdensome to the patient, and readily generalizable to new data. A major obstacle in development of gene expression biomarker tests is the diversity of measurement platforms and the inherent variability of the data, often resulting in predictors that perform well in the original datasets but cannot be universally applied to new samples collected in other settings. Here, we introduce TimeSignature, an algorithm that robustly infers circadian time from gene expression. We demonstrate its application in data from three independent studies using distinct microarrays and further validate it against a new set of samples profiled by RNA-sequencing. Our results show that TimeSignature is more accurate and efficient than competing methods, estimating circadian time to within 2 h for the majority of samples. Importantly, we demonstrate that once trained on data from a single study, the resulting predictor can be universally applied to yield highly accurate results in new data from other studies independent of differences in study population, patient protocol, or assay platform without renormalizing the data or retraining. This feature is unique among expression-based predictors and addresses a major challenge in the development of generalizable, clinically useful tests.Entities:
Keywords: circadian rhythms; cross-platform prediction; gene expression dynamics; machine learning
Mesh:
Substances:
Year: 2018 PMID: 30201705 PMCID: PMC6166804 DOI: 10.1073/pnas.1800314115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
TimeSignature predictive genes
| Gene | Freq. | Gene | Freq. | Gene | Freq. |
| DDIT4 | 1.00 | GZMB | 0.58 | CAMKK1 | 0.17 |
| GHRL | 1.00 | CLEC10 A | 0.50 | DTYMK | 0.17 |
| PER1 | 1.00 | PDK1 | 0.50 | NPEPL1 | 0.08 |
| EPHX2 | 0.92 | GPCPD1 | 0.50 | MS4A3 | 0.08 |
| GNG2 | 0.83 | MUM1 | 0.33 | IL13RA1 | 0.08 |
| IL1B | 0.83 | STIP1 | 0.33 | ID3 | 0.08 |
| DHRS13 | 0.83 | CHSY1 | 0.25 | MEGF6 | 0.08 |
| NR1D1 | 0.75 | AK5 | 0.25 | TCN1 | 0.08 |
| ZNF438 | 0.75 | CYB561 | 0.25 | NSUN3 | 0.08 |
| NR1D2 | 0.75 | SLPI | 0.25 | POLH | 0.08 |
| CD38 | 0.75 | PARP2 | 0.25 | SYT11 | 0.08 |
| TIAM2 | 0.75 | PGPEP1 | 0.17 | SH2D1B | 0.08 |
| CD1C | 0.75 | C12orf75 | 0.17 | REM2 | 0.08 |
| LLGL2 | 0.58 | FKBP4 | 0.17 |
A set of 41 genes is sufficient for accurate TimeSignature prediction. As described in the text, the predictors may vary slightly depending on the training data; the frequency with which each gene was selected as a predictor across 12 repeated runs using different training samples is given. Eighteen are selected the majority of the time; the remaining “auxiliary” predictors vary from run to run. Genes are sorted in order of selection frequency.
Fig. 1.Accuracy of TimeSignature predictor applied to data from four distinct studies. Each column corresponds to the indicated dataset. The TimeSignature predictor was trained on a subset of subjects from the Möller–Levet study (37) and then applied to the remaining subjects (Test Set) along with three independent datasets: V1 (38), V2 (39), and V3. For each sample being predicted, two-point within-subject normalization was performed using that sample and a single other sample from the same subject 12 h away. In the top row, the predicted time of day vs. true time of day for each sample is shown. Dark and light gray bands indicate an error range of 2 and 4 about the true time. For the first three studies, color of the point indicates experimental condition: in the Test Set, control (black) vs. sleep restriction (orange); in V1, control (black) vs. forced desynchrony (orange); in V2, control days (black), sleep deprived day (orange), and recovery day (blue). In V3, all subjects () were in the control constant-routine condition (black). In the bottom row, we plot the fraction of correctly predicted samples for each study vs. the size of the error for the TimeSignature predictor (solid black), PLSR-based model (dashed purple), and ZeitZeiger (dotted cyan). Normalized area under the curves (nAUCs) (see ) and median absolute errors are listed for each. The TimeSignature median absolute error across all samples in all studies was 1:37.
Fig. 2.Distribution of TimeSignature accuracy in the two-draw application as a function of elapsed time between the first and second draw (V3 data). Horizontal lines are given to guide the eye at nAUC = 0.5 (chance) and nAUC = 0.7 (generally considered good). Boxes are colored according to the absolute time-difference modulo full days (e.g., a 20-h interval corresponds to a 4-h difference in time of day, and thus both the 4-h and 20-h boxes are shaded yellow). A vertical line at 12 h indicates the axis of this symmetry.