| Literature DB >> 35300586 |
Francis C Motta1, Robert C Moseley2, Bree Cummins3, Anastasia Deckard4, Steven B Haase2.
Abstract
BACKGROUND: Cell and circadian cycles control a large fraction of cell and organismal physiology by regulating large periodic transcriptional programs that encompass anywhere from 15 to 80% of the genome despite performing distinct functions. In each case, these large periodic transcriptional programs are controlled by gene regulatory networks (GRNs), and it has been shown through genetics and chromosome mapping approaches in model systems that at the core of these GRNs are small sets of genes that drive the transcript dynamics of the GRNs. However, it is unlikely that we have identified all of these core genes, even in model organisms. Moreover, large periodic transcriptional programs controlling a variety of processes certainly exist in important non-model organisms where genetic approaches to identifying networks are expensive, time-consuming, or intractable. Ideally, the core network components could be identified using data-driven approaches on the transcriptome dynamics data already available.Entities:
Keywords: Cell cycle; Circadian rhythms; Gene regulatory networks; Network inference; Transcription factors
Mesh:
Substances:
Year: 2022 PMID: 35300586 PMCID: PMC8932128 DOI: 10.1186/s12859-022-04627-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Conceptual model of core regulatory elements. A Conceptual model of a transcriptional regulatory network with core nodes (squares) operating in a strongly-connected subnetwork of mutual activation (arrows) and repression (short bars), together with outputs of the core (circles). Output nodes transmit the transcriptional signal that is generated by the core, but which diminishes as it moves away from core nodes. B Illustrations of transcript abundance profiles exhibited by the core and its output nodes, with core nodes having oscillations that have a precise match to a specified period (shaded region) and large variations in expression
Quantitative metrics of periodicity and regulation strength used in this study to rank genes
| Name | Function | Type | Description |
|---|---|---|---|
| DL Per Score | Per(G) | Periodicity | A measure of abundance profile periodicity as defined by Eq. (3) |
| DL Per | Periodicity | An empirical | |
| JTK Per | Periodicity | An analytic | |
| DL Reg Score | Reg(G) | Regulation | A measure of the variability of transcript abundance about its mean expression level as defined by Eq. (2) |
| DL Reg | regulation | An empirical | |
| PerReg | Combined | The product of DL Per and DL Reg Scores | |
| DL | Combined | The original periodicity measure introduced in [ | |
| DL | Combined | A modified version of the original periodicity measure introduced by [ |
Refer to Additional file 5: Supplementary Information for equation definitions
Fig. 2Identifying core genes among transcription factors. Average precision of classifiers identifying core from non-core TFs among all TFs by combined metrics (A) and individual metrics (B) (Table 3) as well as the baseline average precision of a random classifier, for each dataset (Table 4)
Time series transcript abundance datasets used in this study
| Organism | ||||||
|---|---|---|---|---|---|---|
| Synch. in | Cell cycle | Cell cycle | Circadian | Circadian | Diurnal | Circadian |
| Technology | RNASeq | Microarray | Microarray | RNASeq | Microarray | Microarray |
| Period | 75 min* | 94 min* | 24 h | 24 h | 24 h | 24 h |
| Duration | 245 min | 254 min | 48 | 42 | 48 | 48 |
| Frequency | 5 min | 16 min | 2 h | 6 h | 4 h | 4 h |
| Timepoints/cycle | 15 | 5.875 | 12 | 4 | 6 | 6 |
| Reference | [ | [ | [ | [ | [ | [ |
| No. of genes | 5910 | 5718 | 19,750 | 18,388 | 22,484 | 22,484 |
| No. of TFs | 304 | 307 | 1373 | 1118 | 1415 | 1415 |
| No. of core | 17 | 17 | 15 | 14 | 11 | 11 |
LL_LDHC: Constant light and temperature; LDHC: 24 hour cycling light and temperature
*Cell-cycle period length was taken from the respective publication, which estimated period length using the CLOCCS algorithm [54]
Counts are based on post-processed datasets (see Materials and Methods)
Top 25 transcription factors ranked by DLJTK metric
| Rank | ||||||
|---|---|---|---|---|---|---|
| MA | RNA | MA | RNA | LDHC | LL_LDHC | |
| 1 | ||||||
| 2 | ||||||
| 3 | ||||||
| 4 | ||||||
| 5 | ||||||
| 6 | ||||||
| 7 | ||||||
| 8 | ||||||
| 9 | ||||||
| 10 | ||||||
| 11 | ||||||
| 12 | ||||||
| 13 | ||||||
| 14 | ||||||
| 15 | ||||||
| 16 | ||||||
| 17 | ||||||
| 18 | ||||||
| 19 | ||||||
| 20 | ||||||
| 21 | ||||||
| 22 | ||||||
| 23 | ||||||
| 24 | ||||||
| 25 | ||||||
| Recall | 76.5% | 70.6% | 66.7% | 28.6% | 36.4% | 45.5% |
LL_LDHC: Constant light and temperature; LDHC: 24 hour cycling light and temperature; MA: Microarray; RNA: RNAseq
*Core transcription factors in Additional file 2—Core Genes
Fig. 3Transcript abundance dynamics across DL JTK rankings of transcription factors. A Distribution of DL JTK ranks of core S. cerevisiae TFs among all TFs and time series expression of two core TFs: NDD1, which is highly ranked (rank 13), and MCM1, which is not highly ranked (rank 266). NDD1 and MCM1 act in a complex to regulate downstream targets. B Heatmaps of standardized gene expression profiles of the genes ranked (left) 1–25, (middle) 76–100, and (right) 276–300 by DL JTK. Within each subpanel, genes are ranked by peak expression
Fig. 4Identifying core genes among all genes. Average precision of classifiers identifying core from non-core TFs among all genes by A combined metrics and B individual metrics (Table 3) as well as the baseline average precision of a random classifier, for each dataset (Table 4)
Fig. 5Identifying transcription factors among all genes. Average precision of classifiers identifying TFs from non-TFs among all genes by combined metrics and individual metrics (Table 3) as well as the baseline average precision of a random classifier, for each dataset (Table 4)
Interaction relationships between core TFs and non-core that appear in the top 25 TFs as ranked by DL JTK
| Gene | Targeted | Targets | Gene | Targeted | Targets | Gene | Targeted | Targets |
|---|---|---|---|---|---|---|---|---|
| FHL1 | ARNTL [ | RVE4 | ||||||
| FHL1 | ARNTL [ | CCA1 | ||||||
| ACE2 | HLF [ | CCA1 | ||||||
| FHL1 | CLOCK [ | CHE | ||||||
| ACE2 | CLOCK [ | CCA1 | ||||||
| FHL1 | CHE | |||||||
| FHL1 | CHE | |||||||
| SWI4 | CHE | |||||||
| ACE2 | ||||||||
| SWI4 | ||||||||
| FKH1 | ||||||||
| MBP1 | ||||||||
| MCM1 | ||||||||
| MBP1 | ||||||||
| MCM1 | ||||||||
| ACE2 | ||||||||
| FHL1 | ||||||||
*S. cerevisiae and A. thaliana interactions determined respectively by database searches of [13] and [14] and represent a range of direct and indirect evidence types, including the presence of binding motifs in regulatory regions and response to TF over-expression. M. musculus interactions determined by evidence gathered in the associated citation
M. musculus non-core TFs drawn from MA dataset only, while non-core S. cerevisiae and A. thaliana TFs were drawn from the unions of each pair of analyzed datasets