A cell division cycle is a well-coordinated process in eukaryotes with cell cycle genes exhibiting a periodic expression over time. There is considerable interest among cell biologists to determine genes that are periodic in multiple organisms and whether such genes are also evolutionarily conserved in their relative order of time to peak expression. Interestingly, periodicity is not well-conserved evolutionarily. A conservative estimate of a number of periodic genes common to fission yeast (Schizosaccharomyces pombe) and budding yeast (Saccharomyces cerevisiae) ('core set FB') is 35, while those common to fission yeast and humans (Homo sapiens) ('core set FH') is 24. Using a novel statistical methodology, we discover that the relative order of peak expression is conserved in ∼80% of FB genes and in ∼40% of FH genes. We also discover that the order is evolutionarily conserved in six genes which are potentially the core set of signature cell cycle genes. These include ace2 (a transcription factor) and polo-kinase plo1, which are well-known hubs of early M-phase clusters, cdc18 a key component of pre-replication complexes, mik1 which is critical for the establishment and maintenance of DNA damage check point, and histones hhf1 and hta2.
A cell division cycle is a well-coordinated process in eukaryotes with cell cycle genes exhibiting a periodic expression over time. There is considerable interest among cell biologists to determine genes that are periodic in multiple organisms and whether such genes are also evolutionarily conserved in their relative order of time to peak expression. Interestingly, periodicity is not well-conserved evolutionarily. A conservative estimate of a number of periodic genes common to fission yeast (Schizosaccharomyces pombe) and budding yeast (Saccharomyces cerevisiae) ('core set FB') is 35, while those common to fission yeast and humans (Homo sapiens) ('core set FH') is 24. Using a novel statistical methodology, we discover that the relative order of peak expression is conserved in ∼80% of FB genes and in ∼40% of FH genes. We also discover that the order is evolutionarily conserved in six genes which are potentially the core set of signature cell cycle genes. These include ace2 (a transcription factor) and polo-kinase plo1, which are well-known hubs of early M-phase clusters, cdc18 a key component of pre-replication complexes, mik1 which is critical for the establishment and maintenance of DNA damage check point, and histones hhf1 and hta2.
A cell division cycle among eukaryotes consists of a sequence of four major phases, namely, the G1, S, G2 and the M phase. The G1 phase (also known as the Gap 1 phase) is a resting phase where the cells grow in size and prepare for synthesis during the S phase. Furthermore, G1 phase also serves as one of two major check points, where if DNA damage is detected then the cell is prevented from proceeding to the S phase (1,2). Cells which pass the G1 check point proceed to S phase where the DNA replication takes place. This phase is followed by the G2 or Gap 2 phase. Similar to G1, this phase serves as a check point to ensure cells with damaged DNA do not proceed to the M phase (or mitosis) where the cells divide to form two daughter cells.Genes participating in a cell division cycle have a cyclical pattern of expression with peak attained just before their function (3). There are several intrinsic differences among organisms in various aspects of cell division cycle. For instance, the amount of time spent by a cell in different phases varies. Fission yeast, Schizosaccharomyces pombe (S. pombe), cell spends almost 70% of its time in the G2 phase, whereas the budding yeast, Saccharomyces cerevisiae (S. cerevisiae), cell spends roughly quarter of its time in G2 phase. Arbidopsis thaliana has a relatively small G2 phase but a long S phase. In contrast to S. pombe, a human cell may spend substantially more time in the G1 phase than in G2 phase (See www.cyclebase.org). Also, the proportion of genes that are known to participate in cell division cycle varies with organisms. For example, it is estimated that there are twice as many periodic genes in S. cerevisiae as in S. pombe (4).Despite many such differences, researchers are interested in (i) identifying genes that are periodic in multiple organisms (referred to as ‘periodically conserved’ genes) (Figure 1); (ii) among periodically conserved genes, identifying those that are also conserved in their phase of peak expression (Figure 1). This has been an active area of research over the past several decades [cf. (3,5)]. With the advent of microarray technology, numerous microarray studies have been conducted on several model organisms such as S. cerevisiae (6–9), S. pombe (4,10–12), Homo sapiens (13) and Arabidopsis (14,15). Such large-scale genome wide data on multiple organisms provide an excellent opportunity to determine genes involved in cell cycle and study their functions. It also allows one to understand the similarities and dissimilarities in the cell cycle of various organisms. A useful database containing results from various cell cycle microarray experiments is available at www.cyclebase.org (16), henceforth referred as ‘cyclebase’. These microarray data allow biologists to debate the conservation of genes participating in a cell cycle and their times of peak expression (3,4,10,11, 12,17). Based on a comprehensive analysis of the above microarray data and other published data, Jensen et al. (3) concluded that both periodicity as well as phase of peak expression are evolutionarily poorly conserved. Earlier a similar conclusion was drawn by Rustici et al. (4) who concluded that only 40 or so orthologs are periodic in both species of yeasts.
Figure 1.
Conservation of periodicity and phase. In each panel, the vertical hash mark on the time axis represents the boundary of a phase.
Conservation of periodicity and phase. In each panel, the vertical hash mark on the time axis represents the boundary of a phase.Although the poor conservation of periodicity and the phase of peak expression for most cell cycle genes may be biologically plausible, as evolutionarily functions of some genes may have changed, one cannot ignore variability between and within studies that may have contributed to these findings. For instance, even within the same organism there are major differences among studies published in the literature. Recently, three different groups of researchers conducted a total of 10 microarray experiments on S. pombe [5 by Rustici et al. (4), 3 by Oliva et al. (11) and 2 by Peng et al. (12)]. As summarized by (11) and by Caretta-Cartozo et al. (17), the three studies disagreed considerably on the number of periodic genes. According to Caretta-Cartozo et al. (17), only 156 out of ∼5000 genes in S. pombe genome were declared to be periodic by all three studies, although individually each group identified at least three times as many periodic genes. Furthermore, even among genes that were found to be periodic in at least two of the three studies, there were disagreements among the studies in terms of time to peak expression of some genes. For instance, according to Peng et al. (12) cdc18, mob1, imp2 and cig2 peak during the G1 phase, whereas Oliva et al. (11), Rustici et al. (4) and cyclebase suggest that these are M phase genes.The above discrepancies among studies even in the same organism are not surprising and can be attributed to various factors such as, natural variability in the data, experimental conditions, etc. Factors such as these result in statistical variability and uncertainty in estimates of time to peak expression. Not much has been discussed in the literature regarding these issues. The problem becomes even more challenging when comparisons across multiple organisms are to be made. In such comparisons, the biological differences among organisms may be confounded by statistical uncertainties due to variability in the data. For example, Rustici et al. (4) concluded that cell cycle regulation of majority of genes is not conserved between S. pombe and S. cerevisiae. On the other hand, Oliva et al. (11) and Peng et al. (12) suggest greater amount of similarities between the two species of yeasts and infer conservation of regulatory mechanisms.Since cell division cycle is a carefully orchestrated process, it is reasonable to hypothesize that the relative order of peak expression among a core set of cell cycle genes may be conserved even if the phase of some genes may have been evolutionarily modified (Figure 2). Genes whose relative order of time to peak expression is conserved may not only have a well-defined function in cell division cycle, but may also have potential interactions or associations with each other. In this article, we develop a formal statistical methodology to test the hypothesis that the relative order of peak expression among a core set of cell cycle genes is conserved in a pair of organisms. Using this methodology, we investigate if the ‘core set FB’ of fission yeast genes have the same relative order of peak expression as their budding yeast orthologs. Similarly, we investigate if the ‘core set FH’ of fission yeast genes have the same relative order of peak expression as their human orthologs. The statistical procedure developed in this article is novel and could help biologists formulate and test other similar hypotheses.
Figure 2.
(I) Relative order of peak expression of genes A,B and C is not conserved in Species 1 and 2. (II) Relative order of peak expression of genes D, E and F is conserved in Species 1 and 2. Orthologs in Species 2 are denoted by lower case letters a, b, c, d, e and f. In each panel, the vertical hash mark on the time axis represents the boundary of a phase. Insets in each panel represent the time to peak expression in terms of phase of cell cycle.
(I) Relative order of peak expression of genes A,B and C is not conserved in Species 1 and 2. (II) Relative order of peak expression of genes D, E and F is conserved in Species 1 and 2. Orthologs in Species 2 are denoted by lower case letters a, b, c, d, e and f. In each panel, the vertical hash mark on the time axis represents the boundary of a phase. Insets in each panel represent the time to peak expression in terms of phase of cell cycle.
MATERIALS AND METHODS
Relative order of peak expression among cell cycle genes
For a gene D, the time to its peak expression can be described in terms of an angle on a unit circle, known as the phase angle, which is denoted by ϕ. Suppose D, E and F are three cell cycle genes where D is an S phase gene, E is an early G2 phase gene and F is a mid-G2 phase gene, then they satisfy the relative order; D followed by E which is followed by F and which is followed by D. We represent this relative order of peak expression among the three genes by ϕ ≺ ϕ ≺ ϕ ≺ ϕ (or equivalently, D ≺ E ≺ F ≺ D). Suppose S. cerevisiae genes D, E and F are the orthologs of S. pombe genes d, e and f, respectively. Suppose D ≺ E ≺ F ≺ D, then we say that the relative order is conserved among the orthologs if d ≺ e ≺ f ≺ d (see Figure 2). In the ideal setting, if functions of all genes in the core set are conserved through evolution and if cell cycle is a well-ordered mechanism of nature, then it is reasonable to hypothesize that the relative order of expression of genes in the core set is conserved between the two organisms. Note that the relative order is invariant of the location of the pole of the circle. This is important for several reasons. First, biologically, the order of genes around the circle has no bearing on where the pole of the circle is established (i.e. it is rotation invariant). Secondly, a common challenge with time course cell cycle experiments is that one cannot be sure about the exact biological time when the cells were arrested to define the pole precisely. Also, different labs and experiments may arrest cells during different phases of cell cycle. Consequently, it can be challenging to compare phase angles across experiments since each experiment may have a different pole. However, the relative order of genes should be invariant of the location of the pole. Also, it is important to note that our definition of conservation of relative order does not require the orthologs pairs (D, d ), (E, e), (F, f ), etc. to have same phase angles or even the same phases (see the right panels in Figure 2). We just require d, e, f to satisfy the same relative order as D, E, F.Using Rustici et al. (4), Oliva et al. (11) and cyclebase, we arrived at the core set FB of 35 S. pombe cell cycle genes that are periodic in both yeasts (Table 1). Similarly, using cyclebase we arrived at the core set FH of 24 S. pombe cell cycle genes that are periodic in both S. pombe as well as H. sapiens (Table 2). We limited our core sets to include only those genes whose cycelbase periodicity rank is ≤500. The rank cut-off of 500 was arbitrarily chosen. Our point is that genes with higher ranks are less likely to be periodic with estimated phase angles subject to small concentration parameter, resulting in large uncertainty estimates.
Table 1.
Saccharomyces pombe genes in the core set FB arranged according to the relative order of S. cerevisiae orthologs
Relative order
Saccharomyces cerevisiae
Saccharomyces pombe
Gene
CBase Rank
Peaktime (CBase) (deg.)
Phase (CBase)
Gene
CBase Rank
FSA order
1
CDC6
500
349.2
M
cdc18
12
18
2
RFA1
6
68.4
G1
ssb1
56
12
3
RNR1
54
72
G1
cdc22
8
2
4
MSH6
16
72
G1
msh6
50
10
5
MRC1
192
72
G1
mrc1
33
23
6
POL1
112
72
G1
pol1
61
27
7
SMC3
55
72
G1
psm3
73
14
8
MCD1
11
75.6
G1
rad21
94
16
9
RAD51
34
75.6
G1
rhp51
275
34
10
CLN2
8
79.2
G1
cig2
38
9
11
POL2
84
79.2
G1
pol2
128
32
12
SWE1
77
97.2
S
mik1
51
11
13
HHT2
28
133.2
S
h3.3
20
5
14
HHF1
30
136.8
S
hhf1
17
3
15
HHT1
13
140.4
S
hht3
26
6
16
HTA2
12
144
S
hta2
31
7
17
HTB2
5
144
S
htb1
19
4
18
HTZ1
188
176.4
S
pht1
122
31
19
KIP3
390
165.6
S
klp5
69
28
20
FKH1
85
180
S/G2
fkh2
35
8
21
SWI5
124
234
G2
ace2
15
19
22
BUD4
229
234
G2
mid2
28
22
23
CDC5
117
234
G2
plo1
46
24
24
MOB1
177
248.4
G2
mob1
119
30
25
ASE1
111
252
G2
mcp1
359
35
26
MYO1
253
252
G2
myo3
49
26
27
CHS2
87
252
G2
chs2
59
13
28
HOF1
113
255.6
G2
cdc15
11
17
29
HOF1
113
255.6
G2
imp2
76
29
30
KIN3
104
291.6
G2/M
fin1
47
25
31
DBF2
72
298.8
M
sid2
83
15
32
SST2
471
324
M
rgs1
159
33
33
CDC20
24
338.4
M
slp1
2
1
34
PST1
100
0
M/G1
SPAC1705.03C
24
21
35
DSE4
435
21.6
G1
eng1
21
20
CBase = Cyclebase
Table 2.
Saccharomyces pombe genes arranged in the core set FH according to the relative order of H. sapiens orthologs
Relative order
Homo sapiens
Saccharomyces pombe
Gene
CBase Rank
Peaktime (CBase) (deg.)
Phase (CBase)
Gene
CBase rank
FSA order
1
IFIT2
313
10.8
G1
ssn6
249
19
2
LRRC56
496
43.2
G1
sds22
384
22
3
ZNF367
48
140.4
G1
ace2*
15
3
4
CDC6
40
165.6
G1
cdc18*
12
2
5
PKMYT1
231
187.2
S
mik1*
51
13
6
HIST2H4B
219
194.4
S
hhf1*
17
11
7
DHFRL1
213
208.8
S
dfr1
473
24
8
SH3GL2
332
219.6
S
SPBC19C2.10
310
21
9
H2AFX
83
259.2
S/G2
hta2*
31
4
10
HRSP12
116
262.8
G2
mug71
155
14
11
PCBP1
189
266.4
G2
rnc1
186
16
12
TOP2A
14
291.6
G2
top2
226
18
13
PIF1
35
302.4
G2
pif1
88
8
14
FOXM1
107
309.6
G2
fkh2*
35
12
15
Hsp40
203
320.4
G2
SPCC63.13
216
17
16
CDC25B
64
324
G2/M
cdc25
158
15
17
AURKA
4
324
G2/M
ark1
265
20
18
KIF10
7
327.6
M
klp5*
69
7
19
CCNB1
49
334.8
M
cig2*
38
5
20
PLK1
1
334.8
M
plo1*
46
6
21
MAPK13
3
334.8
M
spm1
97
10
22
CDC20
24
334.8
G2
slp1*
2
1
23
API4
38
338.4
M
bir1
453
23
24
RAD21
80
345.6
M
rad21*
94
9
Genes with * have periodic S. cerevisiae orthologs, CBase = Cyclebase
Saccharomyces pombe genes in the core set FB arranged according to the relative order of S. cerevisiae orthologsCBase = CyclebaseSaccharomyces pombe genes arranged in the core set FH according to the relative order of H. sapiens orthologsGenes with * have periodic S. cerevisiae orthologs, CBase = CyclebaseIn the case of human orthologs, we relied completely on the peaktime specified by the cyclebase database to arrive their relative order (Table 2). However, in the case of budding yeast orthologs, in addition to cyclebase, we used published literature and Saccharomyces Genome Database (http://www.yeastgenome.org/) to arrive at the relative order (Table 1).We now describe the relative order of the 35 S. cerevisiae orthologs. Since the mRNA level as well as its protein level peaks during the early stages of G1 phase and is the precursor for DNA synthesis, therefore we begin with CDC6. This gene is followed by several G1 phase genes such as those involved in DNA repair, replication and check point (Replication Factor Alpha, RNR1, MSH6, MRC1 and POL1), cohesion of sister chromatids (SMC3 and MCD1), recombinational repair of double-strand breaks in DNA (RAD51), activation of Cdc28p to promote transition from G1 to S phase (CLN2, a late G1 phase cyclin) and DNA synthesis during DNA repair (POL2). The S phase genes that followed the G1 phase genes are those involved in: regulation of G2/M transition by inhibition of Cdc28p kinase activity (SWE1), chromatin assembly (histones such as HHT2, HHF1, HHT1, HTA2, HTB2, HTZ1) and mitotic spindle position (KIP3). These are followed by G2 phase transcription factors such as FKH1 and SWI5. Several G2 phase genes considered here have proteins involved in important functions such as: bud site selection (BUD4), cytokinesis and septation (CDC5, MOB1, ASE1, MYO1, CHS2, HOF1). Note that we considered both S. pombe orthologs of HOF1, namely, cdc15 and imp2 in our analysis. These genes are followed by KIN3, G2/M check point gene whose protein Kin3 plays a critical role in DNA damage recognition before the cell enters M phase (18). Among the M phase genes in the proposed relative order, DBF2 and CDC20 have a function for cells to exit from mitosis, while PST1 has function in the construction of cell wall. Our proposed relative order concluded with DSE4, a daughter cell-specific protein which degrades the cell wall causing the daughter cell to separate from the mother cell. Hence, it is logical that DSE4 was the last gene in our proposed relative order before returning to CDC6. According to cyclebase, some of the S. cerevisiae orthologs in the core set FB have identical peaktimes hence we assigned identical phase angles to all such genes in the null hypothesis. Specifically, following genes within parenthesis were hypothesized to have the same phase angles: (cdc22, msh6, mrc1, pol1, psm3), (rad21, rhp51), (cig2, pol2), (ace2, mid2, plo1), (mcp1, myo3, chs2) and (cdc15, imp2). The resulting relative order is described in Figure 3, where genes that were hypothesized to have same phase angle are along the same ray from the center of the circle (ray not drawn). See also Table 1. Known biological functions of genes in the core set FB are provided in Supplementary Table S1. The goal of this study is to test whether the S. pombe genes satisfy the relative order specified by the S. cerevisiae orthologs. Thus we tested the null hypothesis that the phase angle of cdc18 is followed by the phase angle of ssb1, … , SPAC1705.03c which in turn is followed by the phase angle of eng1 which in turn is followed by the phase angle of cdc18 against the alternative hypothesis that this order is not true. More precisely:
Figure 3.
Saccharomyces pombe genes arranged according to the relative order and approximate locations of their S. cerevisiae orthologs (in parenthesis). Sectors drawn are according to S. pombe cell cycle.
H1 :H0 is not trueSaccharomyces pombe genes arranged according to the relative order and approximate locations of their S. cerevisiae orthologs (in parenthesis). Sectors drawn are according to S. pombe cell cycle.Similarly, we tested the following hypotheses to see whether the S. pombe genes satisfy the relative order specified by the 24 H. sapiens orthologs:H1 :H0 is not trueThere are two reasons (biological and statistical) for the above formulation of null and alternative hypotheses. First, as stated earlier, the cell division cycle is a fundamental process in eukaryotes and one would expect various aspects of this process to be conserved through evolution. This is the basic premise of many recent papers [e.g. (4)] which tried to identify genes that are periodic in multiple organisms. Since our investigation is based on such conserved genes, the conservation of the relative order should be the null hypothesis rather than the alternative hypothesis. Basically, among genes that are declared to be conserved between species, our null hypothesis states that their relative order is also conserved. There is also a statistical reason for our choice of null and alternative hypotheses. If the null hypothesis was that the relative order is not conserved among the q genes in the two organisms, then the null hypotheses would contain (q − 1)! configurations of parameters and the alternative hypothesis contains only one configuration. As q increases, the null hypothesis is too large and is never likely to be rejected. For example, if q = 35 the total number of possible null configurations are of the order 1040, which is extremely large. No statistical test would have sufficient power to reject the null hypothesis in such situations. In fact, the power will go to zero as q increases!
Statistical test
For each gene i, i = 1, 2, … , q and experiment j, j = 1, 2, … , E, we model the unconstrained estimator of phase angle of peak expression θ, obtained from the Random Period Model (RPM) (19), using the von Mises distribution (VM). This distribution plays an important role in circular data analysis similar to the normal distribution for Euclidean data. Thus, we assume that where ϕ is the true unknown phase angle of peak expression of gene i in the j-th experiment. The concentration parameter κ represents the uncertainty associated with θ. We assume that κ depends on experiment j but not on gene i. There are two sources of uncertainty associated with the phase angle estimate of each gene, one is specific to the gene and the other is due to the experiment (which is common to all genes within the experiment). This resembles the classical mixed effects linear model in Euclidean space data. Since the number of time points used in each of the time course experiments considered in this article is fairly large, for any specific gene, the uncertainty associated with the estimator of the phase angle based on the RPM is negligible relative to the uncertainty due to the experiment. For this reason, we only retained the uncertainty component corresponding to the experiment.For each experiment j, j = 1, 2, … , E, our problem of interest is to the test the following hypotheses:H1 :H0 is not true.Let be a simple order constraint starting at index I and let ϕ = (ϕ1, ϕ2, … , ϕ)′. Then, for each j = 1, 2, …, E, the above null hypothesis can be rewritten as ϕ ∈ C, where .Let denote the restricted maximum likelihood estimator of ϕ subject to the constraint ϕ ∈ C (20). determines a partition f = {1, … , I} into sets of consecutive coordinates on which is constant. These sets are called level sets. Then, we construct the following test statistic to test the above hypothesesNotice that T is a measure of the angular distance between θ and . Our conditional test T, described in detail in the Supplementary Data, rejects the null hypothesis whenever T ≥ c(m), where m is the number of level sets for , c(m) is chosen so that , and ϕ0 is any point in the null hypotheses for which all coordinates are equal. Since in practice, κ are unknown, in order to derive a proper value for these parameters, we obtained its maximum likelihood estimator using an analysis of variance approach based on the von Mises distribution (see Section 1.3 in Supplementary Data).Theoretical details of T are given in Section 1.4 of Supplementary Data. There we demonstrate that our proposed test is an asymptotic α level test. The derivation of this latter property is not straightforward as several statistical issues arise as a result of the specific characteristic of the testing problem, namely, a complicated null hypothesis for a directional parametric model.
Lack of fit criterion for a given relative order
For a relative order specified by the null hypothesis, let p denote the P-value associated with the j-th experiment, j = 1, 2, … , E, and let . Note that L always lies between 0 and ∞, with smaller value corresponding to better fit. In the extreme case, if the presumed relative order is perfectly satisfied within each experiment with P-value of 1, then L = 0. Thus, among a collection of plausible orders for a set of cell cycle genes, a biologist may choose the order that corresponds to the smallest value of L. Note that under the null hypothesis, if the P-values are independently and uniformly distributed in the interval (0, 1), then 2L is distributed as a central χ2 random variable with E degrees of freedom. This is often known as Fisher's method of combining P-values and yields a formal statistical test.
RESULTS
Using the 10 S. pombe time course experimental data (4,11,12), we first obtained the unconstrained phase angle estimates of genes in the core sets FB and FH (Supplementary Tables S2 and S3) which are then used for testing various hypotheses described in this article. Using the estimates in Supplementary Table S2, we tested the hypotheses appearing in Equation (1) that all 35 S. pombe genes in FB satisfy the relative order specified by the S. cerevisiae orthologs against the alternative hypothesis that they are not. The null hypothesis is rejected at P ≤ 0.15 in 5 out of 10 experiments (Table 3). Of these five experiments, two have a P < 0.0001. If the null hypothesis was true in each of the 10 experiments, then the binomial probability of observing two or more experiments (out of 10) with a P = 0.0001 is 4.49 × 10−7, which is extremely small. This suggests that the relative order hypothesized in Equation (1) may not be true and thus the 35 S. pombe genes do not follow the same relative order as their S. cerevisiae orthologs. Of course, in the above argument we implicitly assume that the outcomes of the 10 experiments are identically and independently distributed. Although this is a commonly made assumption, we acknowledge that it may be restrictive.
Table 3.
Test for relative order of S. pombe genes in the core set FB (Order specified by S. cerevisiae orthologs)
Experiment
P-values
Based on all 35
Based on final 28
Saccharomyces pombe genes
Saccharomyces pombe genes
Oliva cdc
0.08
0.53
Oliva elut1
0.69
0.98
Oliva elut2
0.14
0.41
Peng cdc
0.24
0.99
Peng elut
0.57
0.99
Rust cdc1
5.37E-11
0.34
Rust cdc2
0.19
0.86
Rust elut1
0.07
0.88
Rust elut2
0.24
0.99
Rust elut3
2.88E-05
0.53
Lack of fit
46.75
3.57
Test for relative order of S. pombe genes in the core set FB (Order specified by S. cerevisiae orthologs)A question of interest is whether we can identify a subset of the 35 genes that conserve the relative order between the two yeasts. Since the number of all possible subsets (of various sizes) is extremely large, it would be practically impossible to enumerate all possible subsets of all sizes and then test the null hypotheses such as the one appearing in Equation (1) for each subset. This problem resembles the classical problem of selection of variables (or model selection) in linear regression analysis. Accordingly, we developed a Forward Selection Algorithm (FSA), which is described in the Supplementary Data. Similar to forward selection procedure in classical linear regression analysis, the FSA proceeds systematically by entering one gene at a time into the test for relative order according to its periodicity rank assigned by the cyclebase. Smaller the rank, the more periodic the gene is and hence its phase angle estimate is more likely to be reliable. The proposed FSA begins with all ortholog pairs that have a cyclebase rank <100. Thus, a gene is included in Step 1 of FSA if both fission yeast as well as the budding yeast orthologs of the gene has a rank <100. Details of the subsequent steps and the implementation of FSA are provided in the Supplementary Data.Using FSA (Supplementary Table S4), we discover that 28 out of 35 S. pombe genes, namely, cdc18, ssb1, cdc22, msh6, mrc1, pol1, psm3, rad21, cig2, pol2, mik1, h3.3, hhf1, hht3, hta2, htb1, pht1, klp5, fkh2, ace2, plo1, chs2, cdc15, imp2, sid2, slp1, SPAC1705.03C, eng1, potentially satisfy the same order as their S. cerevisiae orthologs. Thus, the relative order of these 28 genes seems to be conserved between the two species of yeasts. For these genes, the null hypothesis is rejected in none of the experiments even at a level of significance as high as 0.30 (Table 3). It is also interesting to note that the lack of fit criterion L based on all 35 genes was 46.75 and it dropped to 3.57 for the above 28 genes selected by FSA.Similar to genes in FB, we also tested the Equation (2) for genes in the core set FH and found that the relative order was rejected in 6 out of 10 experiments at a P < 0.001 (Table 4). Using FSA we found ace2, cdc18, mik1, histones (hhf1, hta2), rnc1, top2, cdc25, plo1 and slp1, to satisfy the same relative order as their human orthologs (Table 4). Among these 10 genes, ace2, cdc18, mik1, histones (hhf1, hta2) and plo1 also satisfied the relative order specified by their S. cerevisiae orthologs. Recall that, evolutionarily, humans and fungi are ∼1.5 billion years apart and budding yeast and fission yeasts are nearly billion years apart (21). Thus, it appears that the above six genes are evolutionarily conserved in their relative order of peak expression during the cell division cycle (Figure 4 and Table 5). These six genes are well known in the literature to play a critical role during cell division cycle. For example, the transcription factor ace2 and the polo-kinase plo1 are well-known hubs of early M phase clusters (22), the cell cycle gene cdc18 is a key component of pre-replication complexes for the onset of S phase (23), histones hhf1, hta2 play an important role during the S phase and mik1 is critical in the establishment and maintenance of DNA damage check point (24).
Table 4.
Test for relative order of S. pombe genes in the core set FH in the 10 experiments (Order specified by H. sapiens orthologs)
Experiment
P-values
Based on all 24
Based on final 10
Saccharomyces pombe genes
Saccharomyces pombe genes
Oliva cdc
0.03
0.60
Oliva elut1
0.10
0.93
Oliva elut2
0.19
0.95
Peng cdc
1.10E-03
0.95
Peng elut
0.07
0.83
Rust cdc1
4.12E-10
1
Rust cdc2
2.37E-06
0.93
Rust elut1
0
0.92
Rust elut2
1.90E-13
0.93
Rust elut3
0.06
1
Lack of fit
>100 000
1.10
Figure 4.
A core set of signature cell cycle genes with relative order of time to peak expression conserved among S. pombe, S. cerevisae and H. sapiens. Sectors and approximate locations of genes are drawn according to S. pombe. S. pombe genes are in green, S. cerevisiae orthologs are in red and H. sapiens orthologs are in blue.
Table 5.
A core set of signature cell cycle genes
Saccharomyces pombe gene (Saccharomyces cerevisiae, Homo sapiens orthologs)
Saccharomyces pombe
Saccharomyces cerevisiae
Homo sapiens
plo1 (CDC5, PLK1)
G2
G2
M
ace2 (SWI5, ZNF367)
G2/M
G2
G1
cdc18 (CDC6, CDC6)
M
M
G1/S
mik1 (SWE1, PKMYT1)
M
G1/S
S
hhf1 (HHF1, HIST2H4B)
G1/S
S
S
hta2 (HTA2, H2AFX)
G1/S
S
S/G2
Cell cycle phases are obtained from cyclebase
A core set of signature cell cycle genes with relative order of time to peak expression conserved among S. pombe, S. cerevisae and H. sapiens. Sectors and approximate locations of genes are drawn according to S. pombe. S. pombe genes are in green, S. cerevisiae orthologs are in red and H. sapiens orthologs are in blue.Test for relative order of S. pombe genes in the core set FH in the 10 experiments (Order specified by H. sapiens orthologs)A core set of signature cell cycle genesCell cycle phases are obtained from cyclebaseTo ensure that our statistical test has sufficient power to detect the alternative hypothesis, i.e. reject the null hypothesis that the genes in both species satisfy the same relative order, we conducted a simulation study for the fission and budding yeast data by randomly permuting the order of the genes in Step 1 of FSA and applied the algorithm. We considered 100 permutations and performed the first step of FSA on each permuted data. The null was rejected for all 100 permutations. We also found that in at least 5 out of the 10 experiments the P <0.05 and this occurred in every one of the 100 random permutations we considered. Note that the binomial probability of observing a P-value of 0.05 in at least 5 experiments out of 10 experiments by random chance is 6.36 × 10−5, which is a very unlikely event. Yet, in all 100 random permutations we found 5 out of 10 experiments to have a P <0.05, thus suggesting that our test is reasonably powerful to reject the null hypothesis of relative order if the hypothesis is not true. In our simulation study, we did not investigate the power of our test for alternatives where the order among the genes is not well conserved but not entirely random order. As with any statistical test, there will be a reduction in power as we get closer to the null hypothesis. In other words, if the true order is a very minor perturbation of the null hypothesis then probability of rejecting the null hypothesis would be smaller than when true order is substantially different from the null hypothesis. In a future project, we plan to investigate this problem in greater detail.
DISCUSSION
Since cell cycle genes follow a synchronized pattern of expression (25), one may speculate that some of the cell cycle genes are functionally conserved through evolution. There is an intrinsic order to the peak expression among the cell cycle genes so that they are converted into proteins in a well-synchronized manner to execute their respective functions during cell cycle. Consequently, the relative timing of peak expression of some of the cell cycle genes must be conserved through evolution.Often the order among genes is determined using heat maps and published literature. There does not seem to exist a formal statistical methodology to test hypothesis regarding the order among genes in a given experiment. In this article, we have developed a novel statistical methodology that can be used for testing relative order among the phase angles of cell cycle genes. Using the methodology developed in this article, we demonstrated that a core subset of 28 S. pombe genes have the same relative timing of peak expression as their S. cerevisiae orthologs. This number increases to 32 if we reduce the stringency of our criterion. Thus, it may be reasonable to infer that among the 35 genes in the core set FB, at least 80% satisfy the same relative order of peak expression as their S. cerevisiae orthologs. Similarly, ∼40% of the FH core set genes (10 out of 24) satisfy the same relative order of peak expression as their H. sapiens orthologs.Although this article takes the first step toward a formal statistical methodology for answering questions about conservation of the relative order cell cycle genes, it is important to acknowledge that, analogous to classical linear regression analysis, one may consider other alternatives to FSA and derive an improved algorithm.In this article, we took a ‘conservative approach’ when formulating the hypotheses appearing in Equations (1) and (2). Note that the average cyclebase ranks of S. cerevisiae and H. sapiens orthologs used in this article are 126.57 and 121.21, respectively. These are almost twice the average cyclebase rank of S. pombe genes, which is 66.17. Since higher cyclebase rank corresponds to poorer periodicity, therefore potentially, there is greater uncertainty in the phase angles of S. cerevisiae and H. sapiens orthologs in comparison to S. pombe genes. Since we formulated our null hypotheses using S. cerevisiae and H. sapiens orthologs and used S. pombe data to test, the FSA is more likely to select fewer genes than otherwise. The above formulation resembles the classical ‘single sample’ statistical hypothesis testing problem. It would be useful to extend our procedure so that uncertainties in both orthologs are taken into account when formulating the testing problem, resembling the classical ‘two sample’ testing problem. Note that the ‘two-sample’ problem for testing the equality of two sets of orderings is not well developed even for Euclidean space data, and the problem is substantially more complicated for circular data.The proposed relative order for S. cerevisiae and H. sapiens were determined using the peak times reported in cyclebase and the published literature. We recognize that the exact order among some of the ‘neighboring’ genes is difficult to ascertain. Thus, there is a potential for misspecification of the relative order. This resembles the classical problem of model misspecification that occurs so commonly in a variety of situations. If a biologist chooses to refine our proposed relative order based on her/his understanding of the functions and order of the genes, then she/he may explore such alternative orders and test them using our proposed methodology. A biologist could also select best fitting relative orders using the lack of fit criterion introduced in this article. Hence in this article we have provided a general methodology that would allow biologists to hypothesize a sequential order of peak expression for cell cycle genes and test it.A freely downloadable SAS based user-friendly software can be obtained by either contacting the first author or by visiting www.niehs.nih.gov/research/atniehs/labs/bb/staff/peddada/index.cfm. An R package containing this and other circular data analysis routines is being developed and will soon be made available at the above address.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online. Supplementary Tables S1–S4. Supplementary Information and Supplementary References [26-31].
FUNDING
Spanish Ministerio de Ciencia e Innovación (MTM2009-11161 to M.A.F. and C.R.); Intramural Research Program of the National Institutes of Health, National Institute of Environmental Health Sciences (Z01 ES101744-04). Funding for open access charge: Biostatistics Branch, NIEHS, National Institutes of Health.Conflict of interest statement. None declared.
Authors: Delong Liu; David M Umbach; Shyamal D Peddada; Leping Li; Patrick W Crockett; Clarice R Weinberg Journal: Proc Natl Acad Sci U S A Date: 2004-05-03 Impact factor: 11.205
Authors: P T Spellman; G Sherlock; M Q Zhang; V R Iyer; K Anders; M B Eisen; P O Brown; D Botstein; B Futcher Journal: Mol Biol Cell Date: 1998-12 Impact factor: 4.138
Authors: M Belén Suárez; María Luisa Alonso-Nuñez; Francisco del Rey; Christopher J McInerny; Carlos R Vázquez de Aldana Journal: Cell Cycle Date: 2015-08-03 Impact factor: 4.534
Authors: Cristina Rueda; Miguel A Fernández; Sandra Barragán; Kanti V Mardia; Shyamal D Peddada Journal: Biometrics Date: 2016-03-17 Impact factor: 2.571
Authors: Genevieve Stein-O'Brien; Luciane T Kagohara; Sijia Li; Manjusha Thakar; Ruchira Ranaweera; Hiroyuki Ozawa; Haixia Cheng; Michael Considine; Sandra Schmitz; Alexander V Favorov; Ludmila V Danilova; Joseph A Califano; Evgeny Izumchenko; Daria A Gaykalova; Christine H Chung; Elana J Fertig Journal: Genome Med Date: 2018-05-23 Impact factor: 11.117