Literature DB >> 34463706

Inferring causality in biological oscillators.

Jonathan Tyler^1,2, Daniel Forger^1,3, JaeKyoung Kim^4,5.

Abstract

MOTIVATION: Fundamental to biological study is identifying regulatory interactions. The recent surge in time-series data collection in biology provides a unique opportunity to infer regulations computationally. However, when components oscillate, model-free inference methods, while easily implemented, struggle to distinguish periodic synchrony and causality. Alternatively, model-based methods test the reproducibility of time series given a specific model but require inefficient simulations and have limited applicability.
RESULTS: We develop an inference method based on a general model of molecular, neuronal, and ecological oscillatory systems that merges the advantages of both model-based and model-free methods, namely accuracy, broad applicability, and usability. Our method successfully infers the positive and negative regulations within various oscillatory networks, e.g., the repressilator and a network of cofactors at the pS2 promoter, outperforming popular inference methods. AVAILABILITY: We provide a computational package, ION (Inferring Oscillatory Networks), that users can easily apply to noisy, oscillatory time series to uncover the mechanisms by which diverse systems generate oscillations. Accompanying MATLAB code under a BSD-style license and examples are available at ttps://github.com/Mathbiomed/ION. Additionally, the code is available under a CC-BY 4.0 License at https://doi.org/10.6084/m9.figshare.16431408.v1. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2021 PMID： 34463706 PMCID： PMC8696107 DOI： 10.1093/bioinformatics/btab623

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

A fundamental goal in biology is to uncover causal interactions. Conventional methods manipulate one or more components experimentally to investigate the effect on others in the system. However, these are time-consuming and costly, particularly as the number of components increases. On the other hand, recent technological advances (e.g. GFP, luciferase, microarray, etc.) continue to make measuring time-series data easier. Accordingly, inferring direct regulations solely given time-series data is crucial to revealing the mechanisms underlying systems in a timely and inexpensive manner (Saint-Antoine and Singh, 2020). Various model-free methods have been widely used to infer interactions because they are easy to implement and broadly applicable (Casadiego ; Deyle and Sugihara, 2011; Deyle , 2016; Granger, 1969; Leng ; Ma ; Pourzanjani ; Runge ; Stokes and Purdon, 2017; Sugihara ; Tani ; Tsonis ; Wang ; Ye ). A popular model-free method, Granger Causality (GC), uses predictability to infer interactions, i.e. X causes Y if X has unique information that improves the prediction of Y (Granger, 1969; Stokes and Purdon, 2017). However, GC relies heavily on the assumption that the time-series data are stationary (Lütkepohl, 2005) making it challenging to apply to oscillatory time-series data that are highly non-stationary (Abel ; Pourzanjani ; Stokes and Purdon, 2017; Yang ). To overcome this limitation, inference methods for dynamical systems, such as Convergent Cross Mapping (CCM), use a differing view of predictability to infer causality, i.e. X causes Y if historical values of X can be recovered from Y alone (Deyle ; 2016; Deyle and Sugihara, 2011; Leng ; Ma ; Runge ; Sugihara ; Tani ; Tsonis ; Wang ; Ye ). Despite the success of CCM methods, they struggle to differentiate synchrony (i.e. similar periods among components) versus causality, frequently resulting in an increase in false-positive inferences in oscillatory networks. This is problematic because biological processes frequently exhibit oscillatory behavior in time-series data, e.g. about half of the protein-coding genome is transcribed rhythmically (Mure ; Zhang ). Alternatively, model-based methods infer causality by testing reproducibility of time-series data with mechanistic models. Although testing reproducibility requires computationally expensive model simulations and fittings (Balsa-Canto ; Firman ; Geva-Zatorsky ; Gotoh ; Lillacci and Khammash, 2010; McBride and Petzold, 2018; Mhaskar ; Pitt and Banga, 2019; Radde and Kaderali, 2009; Stražar ; Toni ; Trejo Banos ; Wang ; Wang and Enright, 2013), if the underlying model is accurate, model-based methods do not suffer from false positive predictions unlike model-free methods. However, the inference results strongly depend on the choice of model, often derived from limited information. Thus, inference methods using more general ODE forms were developed (Brunton ; Jensen ; Jo ; Kim and Forger, 2012b; Konopka, 2011; Konopka and Rooman, 2010; Mangan ; McGoff ; Pigolotti , 2009). For example, previously, we developed a method that infers causation from X to Y by checking the reproducibility of oscillatory time-series data given a common ODE model: . Here, f and g describe the synthesis and degradation rates of Y, respectively (Kim and Forger, 2012b). Pigolotti considered the most general possible mechanistic model between two components: However, unlike Kim and Forger (2012b), this method uses only the minima and maxima of the time-series data (Pigolotti ), thus requiring the restrictive assumption that all given components are in a single negative feedback loop. Moreover, extensions of the method require that a single negative feedback loop structure drive the dynamics, limiting their applicability (Jensen ; Pigolotti ). Here, we develop an inference method for biological oscillators described by Equation (1) that is easy to implement, broadly applicable and accurate, while also computationally efficient. Specifically, we identify a fundamental relationship between the general model (Equation 1) and its oscillatory solution. Using this relationship, we develop a functional transformation (i.e. regulation-detection function) of a pair of oscillatory time-series data that easily tests the reproducibility of the time series with the general model. This transformation enables accurate and precise inference of the (self-) regulation type (e.g. positive, negative or a mixture) between two components X and Y described by Equation (1). Our method infers regulations within various network structures such as a cycle, multiple cycles and a cycle with outputs from in silico oscillatory time-series data. Furthermore, our method successfully infers regulation types from noisy experimental data measured at the molecular and organismal levels. In particular, from time-series data of the repressilator and cofactors at the pS2 promoter, our method infers networks that match current biological knowledge while popular model-free methods incorrectly infer nearly fully connected networks. Importantly, our method predicts hidden regulations for the pS2 promoter after estradiol treatment, guiding experimental investigation. Finally, we provide a user-friendly computational package (ION: Inferring Oscillatory Networks) that implements our method to infer network structures of biological oscillators.

2 Results

2.1 Inferring regulation types from oscillatory time series

The reduced FitzHugh-Nagumo model (Fig. 1A) (FitzHugh, 1961) describes the interactions between the membrane potential of a neuron (V) and the accommodation and refractoriness of the membrane (W) (FitzHugh, 1961; Nagumo ). In particular, W positively regulates V while V negatively regulates W. In addition, V displays a mixture of positive and negative self-regulation while W negatively regulates itself.

Fig. 1.

Regulation-detection functions and scores reflect regulation types. (A) The FitzHugh-Nagumo model describes the interactions between the membrane potential of a neuron (V) and the accommodation and refractoriness of the membrane (W). W positively regulates V while V negatively regulates W. In addition, V displays a mixture of positive and negative self-regulation while W negatively regulates itself. (B) Time series of one cycle simulated with the FitzHugh-Nagumo model (A). Although W positively regulates V (i.e. positively depends on W), decreases despite increasing W (yellow region) because the self-regulation of V on masks the effect of W on . On the other hand, for the time points t and reflection time t where , if W(t) is greater (less) than , then should be greater (less) than . Similarly, as V negatively regulates W, if V(t) is greater (less) than , then should be less (greater) than for the time points t and t, where . (C) The regulation-detection function (Equation 2) is positive, and thus the regulation-detection score (Equation 4) equals one, reflecting the positive regulation of W on V. The sign of (Equation 3) changes, and thus (Equation 4), reflecting the mixture of positive and negative self-regulation of V. (D) Both and are negative, and thus , reflecting the negative regulation of V on W and the self-regulation of W. (E) Regulation-detection scores are calculated from the time-series population data of two bacteria: Paramecium, P (blue) and Didinium, D (red) (data taken from (Sugihara )). Reflecting the known predatory interaction, and . Furthermore, reflecting that the self-regulation of both P and D consists of positive (i.e. birth) and negative (i.e. death) regulation, and (Color version of this figure is available at Bioinformatics online.)

How are such inter- and self-regulations reflected in the oscillatory change of V and W simulated with the model (Fig. 1B)? Notably, the changes in V and W do not directly reflect their regulatory interactions. For instance, although W positively regulates V, when W increases, V does not always increase (e.g. in the yellow region, Fig. 1B). This is because W positively regulates rather than V (Fig. 1A). However, the relationship between the change in W and also does not reflect the positive regulation of W on V. For example, in the yellow region (Fig. 1B), decreases despite increasing W because the self-regulation of V on masks the effect of W on . Thus, to infer the effect of W on independently of V, we investigate the relationship between W and at time points t and the reflection time, t where (Fig. 1B). Since , the difference is solely determined by W. Thus, because W positively regulates V (Fig. 1A), if W(t) is greater (less) than should be greater (less) than . Similarly, to infer the type of self-regulation of V, we must remove the variation of due to W that masks the effect of V on . Thus, we investigate the relationship between V and at time points t and the reflection time, t, where (Fig. 1B). To quantify such relationships between W and and V and , we develop the regulation-detection functions: and As W positively regulates V, the functions and have the same sign and thus, throughout the cycle (Fig. 1C, left). That is, if , then (Fig. 1A). On the other hand, due to the mixed self-regulation of V, the relationship between the signs of and , and thus the sign of , varies throughout the cycle (Fig. 1C, right). As the profiles of the sign of the regulation-detection functions (Equations 2 and 3) reflect the regulation type, we develop a regulation-detection score that quantifies the variation in the sign of the regulation-detection functions. For instance, the regulation-detection score for the regulation of W on V is defined as where τ is the period (e.g. τ = 1 in Fig. 1C, left). The regulation-detection score because W positively regulates V, and thus (i.e. the negative area is zero) (Fig. 1C, left). On the other hand, because V both positively and negatively regulates itself, takes both positive and negative values, so (Fig. 1C, right). Similarly, we can obtain information about the regulation of V on W and the self regulation of W with the regulation-detection functions (Fig. 1D, left) and (Fig. 1D, right). Because V negatively regulates W, . Also, because the self-regulation of W is purely negative, . Thus, , and (Fig. 1D). Taken together, in general, if X positively (negatively) regulates Y, then () (see Supplementary Theorem S1 in Supplementary Information). Next, we calculated the regulation-detection scores from experimentally measured oscillatory time-series data of two bacteria: Paramecium (P) and Didinium (D) (Fig. 1E) (Veilleux, 1976). As P is a prey of the predator D (Veilleux, 1976), D is expected to negatively regulate P and P is expected to positively regulate D. Reflecting this, and (Fig. 1E). Furthermore, reflecting the positive (i.e. birth) and negative (i.e. death) self-regulation of both P and D, and (Fig. 1E). The regulation-detection scores appear to accurately reflect regulation types even for noisy and discrete time-series data.

2.2 Regulation inference method from oscillatory time series

If X positively (negatively) regulates Y, then the reflection score (resp., –1). That is, indicates either mixed regulation or the absence of regulation. Thus, when interactions are not mixed (i.e. monotonic), such as gene regulation by a transcription factor or predator-prey relationships, indicates the absence of regulation. This can be used to infer regulations from time-series data, as positive or negative regulation is present only when or –1, respectively. Similarly, self-regulation, which is either positive or negative, is possible only when or –1. However, since depletion of a component typically increases as its own concentration increases, self-regulation can be assumed to be negative (i.e. ). In this case, positive or negative regulation from X to Y is possible only when or , and thus, indicates the absence of regulation (Rule 1, Fig. 2A). Furthermore, we use or to infer positive or negative regulation (Rules 2 and 3, Fig. 2A). Note that, if positive or mixed self-regulation is possible, as in Figure 1E, Rules 2 and 3 can be relaxed to and , respectively.

Fig. 2.

The inference method successfully infers regulations within various in silico network structures. (A) The three rules for regulation inference. indicates the absence of regulation and or indicates positive or negative regulation. (B) The three rules successfully infer the network structure of the Kim-Forger model from simulated time-series data. According to Rule 1, the three regulations and are inferred as absent. According to Rules 2 and 3, the two positive regulations ( and ) and the one negative regulation (), which have and , are inferred. (C,D) Our inference method also successfully infers the negative feedback loop of the Frzilator (C) and a 4-state Goodwin oscillator (D). (E,F) Our inference method also successfully infers correct regulations for more challenging cases beyond the single feedback loop structure, i.e. the mixture of the Kim-Forger and Goodwin models (E) and an extended Kim-Forger model with output variables (F). (G) Our method also successfully infers regulations (solid arrows) of the repressilator from its three mRNA (solid lines) and three protein time-series data (dashed lines). However, our method also falsely predicts negative regulations among the proteins (dashed arrows) due to the similar time series between an mRNA and its protein (e.g. M1 and P1). See Supplementary Tables S1–S5 for the complete list of regulation-detection scores for (C)–(G) and Supplementary Section S4 in Supplementary Information for the equations and parameters used to simulate the data

We illustrate how the three rules (Fig. 2A) infer regulations using as an example the Kim-Forger model (Fig. 2B), a simple model describing the mammalian circadian clock (Kim, 2016; Kim and Forger, 2012a). To infer the network structure (Fig. 2B, bottom) from the time series (Fig. 2B, top), we compute for each possible interaction and self-regulation pair (Fig. 2B, box). Using Rule 1, three regulations are inferred as absent (Fig. 2B, box). Furthermore, Rules 2 and 3 identify the two positive regulations ( and ) and the one negative regulation (), which have and , respectively (Fig. 2B, box). This successfully infers the negative feedback loop structure (Fig. 2B, bottom). Our method also successfully infers regulations in the Frzilator negative feedback loop (Igoshin ) (Fig. 2C and Supplementary Table S1) and a 4-state Goodwin oscillator (Goodwin, 1965) (Fig. 2D and Supplementary Table S2). In fact, the order of peaks and nadirs of the time series in single feedback loops matches the order of regulation in the feedback loop (Fig. 2B–D). For instance, the peak of M is followed by the peaks of P and then P (Fig. 2B). This property has been used in previous algorithms to infer single negative feedback loop structures (Jensen ; Pigolotti , 2009). However, since experimental datasets often contain components from more than one system, we test our method on a more challenging case when data are merged from the Kim-Forger (Fig. 2E, top; solid lines) and Goodwin (Fig. 2E, top; dashed lines) models. If only the order of peaks is used for this example, a single negative feedback loop with seven components is inferred whereas our method successfully infers the two independent underlying networks (Fig. 2E, bottom and Supplementary Table S3). Moreover, our inference method also successfully infers a cyclic network with output variables, also not adhering to the single feedback loop structure (Fig. 2F and Supplementary Table S4). While our method successfully infers regulations within various networks, we caution that can occur even in the absence of regulation, making some correct interactions difficult to distinguish. For example, in the original repressilator model (Fig. 2G, top) (Elowitz and Leibler, 2000; Potvin-Trottier ), the mRNA and protein time series are so similar in phase (i.e. the phase difference is only 2.4% of the total period) that our method, along with inferring the actual interactions, predicts spurious interactions from one protein to the next protein. Thus, we advise users to check for nearly identical time series, which may increase false-positive inferences in our method as well as other inference methods.

2.3 Robustness of the inference method to interpolation error and noise

Experimentally measured time-series data are sampled discretely, in which case our method uses interpolation to generate continuous data (see Section 4.1.1). Accordingly, we test how sensitive our method is to interpolation error, specifically after linear interpolation, by using the in silico datasets in Figures 2B–F. That is, by decreasing the points measured per period from 102 to 101 (i.e. increasing the interpolation error), we quantify the accuracy of our inference method with the score, i.e. the harmonic mean of precision and recall (Fig. 3A). and indicate perfect recovery of all regulations and absence of any correct regulations, respectively. To account for interpolation error, we accept interactions based on three thresholds for values: 0.99, 0.95 and 0.90. For example, a threshold of 0.99 means that we accept any interaction that satisfies both and . We run our method beginning at 100 randomly selected times (Section 4.3). Then, we investigate how the mean of the distribution of scores changes as the sampling rate decreases (Fig. 3B). For single negative feedback loops (i.e. Frzilator, Goodwin, Kim-Forger), our method accurately recovers the network even when the number of data points measured per period is relatively low, e.g. ten per cycle. For the more complicated models (i.e. the merged Goodwin and Kim-Forger and the Kim-Forger with outputs models), slightly more data points are required for inference at high accuracy. Furthermore, our method shows similar robustness across the three thresholds, especially when the points sampled are toward the lower end.

Fig. 3.

Our regulation inference method is robust to interpolation error and to noise. (A) The illustration of F1 calculation with a simple network (true; left, inferred; right). Precision is the number of correctly inferred regulations (TP-true positives) divided by the number of all inferred regulations (TP+FP-false positives), including those inferred incorrectly (FP). Recall is the number of correctly inferred regulations (TP) divided by the number of all true regulations (TP + FN-false negatives). Then, the F1 score is the harmonic mean of precision and recall, which we use to measure the accuracy of our inference method. and 0 indicate perfect recovery of all regulations and the absence of any correct regulations, respectively. (B,C) The accuracy of our inference method when the number of points measured (B) and the level of noise (C) vary. Here, the number of points measured per period decreases from 102 to 101 (B) and the multiplicative noise increases from 0 to 10%, which is sampled from (C). The mean of the score for 100 different time series, which are generated with randomly chosen phases (B) and noise levels (C), is plotted (Section 4.3). Different thresholds for , 0.99 (left), 0.95 (middle) and 0.90 (right), are used

Next, because experimental data are noisy, we increase the level of the multiplicative noise added to the dataset from 0 (no noise) to 10% multiplicative noise (sampled from ). The scores tend to decrease, but the decrease occurs more dramatically when the threshold is 0.99, indicating that the high threshold leads to higher sensitivity to noise. Moreover, this decrease in scores with the threshold of 0.99 is a result of an increase in false negatives (i.e. the exclusion of true interactions due to noise). Thus, we use a threshold of 0.9 when applying our inference method to experimental data (see below) as it leads to the most accurate results in the presence of noise (Fig. 3C). While 0.9 is recommended, depending on the weight of either avoiding false-positive or false-negative predictions, users can adjust the threshold when using our computational package, Inferring Oscillatory Networks (ION) (Fig. 4A; see Supplementary Information and Supplementary Figs S1 and S2 for a step-by-step manual). See Supplementary Information for details about how to choose the threshold.

Fig. 4.

The computational package ION successfully infers networks of repressilator and pS2 promoter cofactors. (A) ION interpolates and smooths noisy and discrete time-series data and then computes the pairwise and self regulation-detection functions and scores () for every pair of components. If satisfies Rules 2 or 3 (Fig. 2A), which can be relaxed depending on the threshold specified by the user, positive or negative regulation is assigned. See Supplementary Information for a comprehensive manual. (B) Using experimentally measured oscillatory time series from (Potvin-Trottier ), ION successfully infers a three-gene repressilator network structure (see Supplementary Fig. S3 for details). On the other hand, two popular model-free inference methods, PCM and GC, infer several false-positive regulations (e.g. λ cI regulates LacI). (C) ION also successfully infers two independent cycle when the experimental repressilator dataset from (A) is duplicated and the phase is shifted by about half of the period (see Supplementary Fig. S3 for details). However, PCM and GC fail to infer independent cycles due to false-positive predictions. (D) ION infers two direct regulations and predict three hidden regulations among cofactors at the estrogen-sensitive pS2 promoter after estradiol treatment (Métivier ) (see Supplementary Fig. S3 for details)

2.4 Successful inferences from experimentally measured time series

As our inference method (ION) is quite robust to discrete data sampling and noise, we expect that our inference method can accurately infer regulations from experimentally measured time series as well. Indeed, our method successfully infers a three-gene repressilator network from experimental data of the three proteins (Potvin-Trottier ) (Fig. 4B and Supplementary Table S6). Note that our method recovers the repressilator network despite the absence of mRNA data because the shape and phase of the mRNA and protein profiles are expected to be similar, as in Figure 2G, due to the short translation time in E.coli relative to the period (Choi ). This indicates that our method also infers indirect regulation with a short time delay. Moreover, we compare our method with two popular model-free inference methods, PCM (Leng ) and GC (Granger, 1969) (Fig. 4B). As these methods can only infer the presence of regulation, not its type (i.e. positive and negative), unlike our method, the arrows represent inferred regulations, which could be either positive or negative. PCM recovers two correct regulations, and , but fails to recover the regulation and makes two false-positive predictions, and (Fig. 4B, middle). While the GC infers all existing regulations, it makes two additional false-positive predictions, and (Fig. 4B, right). Even for this simple three-node network, the popular model-free inference methods make false-positive predictions because the network components oscillate at the same period (Cobey and Baskerville, 2016). Next, we consider a more challenging case: the combination of two copies of the dataset in Figure 4B, one at the original phase and one with shifted phase (Fig. 4C and Supplementary Table S7). Our method successfully infers two repressilator networks, whereas PCM and GC suffer from several spurious regulations (4 and 15 false-positive interactions in Fig. 4B and C, respectively). Note that, even though we are using the same repressilator dataset, there are inconsistencies in the PCM and GC results compared with those from Figure 4B. These inconsistencies are a consequence of the shortened length of data used in Figure 4C compared with that in Figure 4B. This indicates that, in addition to the risk of false-positive inference, PCM and GC are sensitive to the amount of data, unlike ours. For time series measuring the amount of cofactors present at the estrogen-sensitive pS2 promoter after treatment with estradiol [data from Métivier and Lemaire ], PCM and GC infer an almost fully connected network and a fully connected network, respectively (Fig. 4D). On the other hand, our method only infers two regulations, both supported by the current biological understanding of the system. That is, estradiol triggers the binding of human ERα (hER) to the pS2 promoter to recruit RNA Polymerase II, supporting the inferred positive regulation of POLII by hER. Furthermore, TRIP1 acts as a surrogate measure for the 20S proteasome (APIS), which promotes proteasome-mediated degradation of hER (Métivier ), supporting the inferred negative regulation of hER by TRIP1. However, the inferred network (Fig. 4D, Supplementary Table S8) does not contain a negative feedback loop, which is required to generate sustained oscillations (Novák and Tyson, 2008). Thus, there may be intermediate steps between POLII and TRIP1, TRIP1 and HDAC, and also HDAC and hER that form the negative feedback loop (Fig. 4D; question marks). Altogether, this illustrates that our method can identify direct regulations while highlighting connections that require further experimental investigation.

3 Discussion

We developed a model-based method that infers regulations within networks underlying biological oscillators. The method identifies positive or negative regulation by efficiently testing the reproducibility of time-series data given Equation (1). Our method successfully inferred several networks such as single cycles (e.g. repressilator), two independent cycles and a cycle structure with outputs. Importantly, our method can distinguish direct versus indirect regulations, unlike GC and CCM (Fig. 4) (Leng ). That is, when , our method typically infers , not (Fig. 2B–F). However, if is fast and thus Y and Z oscillate with nearly identical phases, our method infers as well (Fig. 2G). Thus, if networks contain hidden steps with fast time scales (i.e. short time delays), our method may infer additional indirect regulations. Furthermore, we provide a user-friendly computational package, ION, that infers regulations within biological networks that oscillate from the molecular to the population level. When our method is coupled with evolving experimental time series, it can uncover unknown functional relationships and mechanisms that drive oscillatory behavior in biological systems. Our method merges the advantages of model-based and model-free methods while mitigating the drawbacks of each. In particular, our model-based inference method does not suffer from the serious risk of false-positive prediction for biological oscillators or sensitivity to the amount of data, unlike the previous model-free inference methods such as GC and PCM (Fig. 4). However, as our inference method is model-based, it runs the risk that the imposed ODE model and functional relationships create false representations of the true interactions (Cobey and Baskerville, 2016). Our method minimizes this risk by using the most general form of an ODE Equation (1) to model the change in a component that is acted upon by another component and itself. In this way, we resolve the limitations of previous model-based methods that restricted the class of models, such as separable synthesis and degradation functions (Jo ; Kim and Forger, 2012b; Konopka and Rooman, 2010), specific types of functions (e.g. power or Hill functions) (Gotoh ; Konopka and Rooman, 2010) and a single feedback loop (Jensen ; Pigolotti , 2009). Thus, we were able to uncover several varying network structures. While we considered the most general form of an ODE Equation (1) that describes the interactions between two components, an interesting future direction would be to extend our work to models that describe the interactions among multiple oscillatory components, e.g. .

4 Materials and methods

4.1 Inferring Oscillatory Networks (ION) computational package

We provide user-friendly MATLAB code at https://github.com/Mathbiomed/ION (Github) and https://doi.org/10.6084/m9.figshare.16431408.v1 (figshare). The ION package can be used to infer the network structure of oscillators, which are described by Equation (1), across all levels of biology. Here, we briefly describe the key steps of the ION package (see Supplementary Information for a comprehensive manual).

4.1.1 Reflection times

For each time point t of the given time series , first, the reflection time needs to be calculated (Fig. 1B). That is, we seek the time point such that and the signs of the slopes at and are opposite (i.e. rising and falling phase). For this, the discrete X(t) is interpolated to a continuous time series with either the ‘linear’ or ‘fourier’ interpolation method, chosen by the user. Then, is estimated by identifying the closest time point to t among time points t satisfying the following equation: Regulation-detection functions and scores reflect regulation types. (A) The FitzHugh-Nagumo model describes the interactions between the membrane potential of a neuron (V) and the accommodation and refractoriness of the membrane (W). W positively regulates V while V negatively regulates W. In addition, V displays a mixture of positive and negative self-regulation while W negatively regulates itself. (B) Time series of one cycle simulated with the FitzHugh-Nagumo model (A). Although W positively regulates V (i.e. positively depends on W), decreases despite increasing W (yellow region) because the self-regulation of V on masks the effect of W on . On the other hand, for the time points t and reflection time t where , if W(t) is greater (less) than , then should be greater (less) than . Similarly, as V negatively regulates W, if V(t) is greater (less) than , then should be less (greater) than for the time points t and t, where . (C) The regulation-detection function (Equation 2) is positive, and thus the regulation-detection score (Equation 4) equals one, reflecting the positive regulation of W on V. The sign of (Equation 3) changes, and thus (Equation 4), reflecting the mixture of positive and negative self-regulation of V. (D) Both and are negative, and thus , reflecting the negative regulation of V on W and the self-regulation of W. (E) Regulation-detection scores are calculated from the time-series population data of two bacteria: Paramecium, P (blue) and Didinium, D (red) (data taken from (Sugihara )). Reflecting the known predatory interaction, and . Furthermore, reflecting that the self-regulation of both P and D consists of positive (i.e. birth) and negative (i.e. death) regulation, and (Color version of this figure is available at Bioinformatics online.)

4.1.2 Regulation-detection function and score

Using the estimated , we compute the regulation-detection function, e.g. , for each time point t as follows: If the linear method is chosen, is linearly interpolated based on the data , and is estimated using a moving slope filter method. Specifically, after fitting a low-order polynomial regression model to with a sliding window (Oppenheim ), the derivative of the polynomial fit is used to estimate ,and then is linearly interpolated based on the estimated . The model order and the length of the sliding window parameters can be adjusted (see Supplementary Information). On the other hand, if the fourier method is chosen, both and are estimated as and , respectively, and similarly, and are estimated as and , respectively, where is the Fourier series fit to the data Y(t). Finally, in both cases, the regulation-detection score (Equation 4) is estimated using the MATLAB function trapz.

4.2 Time-series data

We simulate in silico data using the MATLAB function ode23tb (Fig. 2). See Supplementary Information for the model equations and parameters. The experimental datasets of the repressilator (Fig. 4B) were obtained from (Potvin-Trottier ). Next, to generate the duplicated experimental repressilator dataset (Fig. 4C), we mixed two copies of the repressilator dataset from Figure 4B. We kept one copy at the original phase and, for the second copy, we shifted the phase by 115 min (almost half of the period) (Fig. 4C). Then, we removed data on the left and the right where there was only coverage of one of the two datasets. We obtained the estradiol dataset (Fig. 4D) from (Lemaire ; Métivier ) and the Paramecium/Didinium data (Fig. 1E) from (Sugihara ). The inference method successfully infers regulations within various in silico network structures. (A) The three rules for regulation inference. indicates the absence of regulation and or indicates positive or negative regulation. (B) The three rules successfully infer the network structure of the Kim-Forger model from simulated time-series data. According to Rule 1, the three regulations and are inferred as absent. According to Rules 2 and 3, the two positive regulations ( and ) and the one negative regulation (), which have and , are inferred. (C,D) Our inference method also successfully infers the negative feedback loop of the Frzilator (C) and a 4-state Goodwin oscillator (D). (E,F) Our inference method also successfully infers correct regulations for more challenging cases beyond the single feedback loop structure, i.e. the mixture of the Kim-Forger and Goodwin models (E) and an extended Kim-Forger model with output variables (F). (G) Our method also successfully infers regulations (solid arrows) of the repressilator from its three mRNA (solid lines) and three protein time-series data (dashed lines). However, our method also falsely predicts negative regulations among the proteins (dashed arrows) due to the similar time series between an mRNA and its protein (e.g. M1 and P1). See Supplementary Tables S1–S5 for the complete list of regulation-detection scores for (C)–(G) and Supplementary Section S4 in Supplementary Information for the equations and parameters used to simulate the data

4.3 Discrete and noisy data

To generate discretely sampled data (Fig. 3B), we select a random point in the first period to begin data extraction, and then we uniformly sample two periods worth of data at a sampling rate of 100 points per period. We repeat this process 100 times—every time randomly initializing the starting point in the first period—to generate 100 distinct datasets for every model. Then, we run our algorithm and compute scores for each of the 100 datasets. Next, from each of the 100 generated datasets, we take every other data point to reduce the number of data points (e.g. per period). Our regulation inference method is robust to interpolation error and to noise. (A) The illustration of F1 calculation with a simple network (true; left, inferred; right). Precision is the number of correctly inferred regulations (TP-true positives) divided by the number of all inferred regulations (TP+FP-false positives), including those inferred incorrectly (FP). Recall is the number of correctly inferred regulations (TP) divided by the number of all true regulations (TP + FN-false negatives). Then, the F1 score is the harmonic mean of precision and recall, which we use to measure the accuracy of our inference method. and 0 indicate perfect recovery of all regulations and the absence of any correct regulations, respectively. (B,C) The accuracy of our inference method when the number of points measured (B) and the level of noise (C) vary. Here, the number of points measured per period decreases from 102 to 101 (B) and the multiplicative noise increases from 0 to 10%, which is sampled from (C). The mean of the score for 100 different time series, which are generated with randomly chosen phases (B) and noise levels (C), is plotted (Section 4.3). Different thresholds for , 0.99 (left), 0.95 (middle) and 0.90 (right), are used For the multiplicative noise analysis (Fig. 3C), we begin with two periods worth of data sampled at 100 points per period. Then, we add multiplicative noise sampled randomly from a normal distribution with mean 0 and standard deviation given by the percentage. For example, at 10% multiplicative noise, we add the noise to , where ϵ is sampled randomly from .

4.4 PCM and GC

We ran the Partial Cross Mapping (PCM) (Leng ), an extension of CCM, with an embedding dimension of 3, τ = 1, a max delay of 3 and a threshold of 0.5684 as recommended in (Leng ). We ran the GC using the code provided in (Chandler, 2020), specifying a max delay of 3 as we did with the PCM and a significance level of 95%. We rejected the null hypothesis that Y does not Granger cause X, and thereby inferred direct regulations, if the value of the F-statistic was greater than the critical value from the F-distribution (Granger, 1969). Click here for additional data file.

49 in total

1. A synthetic oscillatory network of transcriptional regulators.

Authors: M B Elowitz; S Leibler
Journal: Nature Date: 2000-01-20 Impact factor: 49.962

2. Dynamical evidence for causality between galactic cosmic rays and interannual variation in global temperature.

Authors: Anastasios A Tsonis; Ethan R Deyle; Robert M May; George Sugihara; Kyle Swanson; Joshua D Verbeten; Geli Wang
Journal: Proc Natl Acad Sci U S A Date: 2015-03-02 Impact factor: 11.205

3. Detecting causality in complex ecosystems.

Authors: George Sugihara; Robert May; Hao Ye; Chih-hao Hsieh; Ethan Deyle; Michael Fogarty; Stephan Munch
Journal: Science Date: 2012-09-20 Impact factor: 47.728

4. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.

Authors: Steven L Brunton; Joshua L Proctor; J Nathan Kutz
Journal: Proc Natl Acad Sci U S A Date: 2016-03-28 Impact factor: 11.205

5. Detection of time delays and directional interactions based on time series from complex dynamical systems.

Authors: Huanfei Ma; Siyang Leng; Chenyang Tao; Xiong Ying; Jürgen Kurths; Ying-Cheng Lai; Wei Lin
Journal: Phys Rev E Date: 2017-07-25 Impact factor: 2.529

6. Oscillatory behavior in enzymatic control processes.

Authors: B C Goodwin
Journal: Adv Enzyme Regul Date: 1965

7. Gene expression model (in)validation by Fourier analysis.

Authors: Tomasz Konopka; Marianne Rooman
Journal: BMC Syst Biol Date: 2010-09-03

8. Synchronous long-term oscillations in a synthetic gene circuit.

Authors: Laurent Potvin-Trottier; Nathan D Lord; Glenn Vinnicombe; Johan Paulsson
Journal: Nature Date: 2016-10-12 Impact factor: 49.962

9. Waveforms of molecular oscillations reveal circadian timekeeping mechanisms.

Authors: Hang-Hyun Jo; Yeon Jeong Kim; Jae Kyoung Kim; Mathias Foo; David E Somers; Pan-Jun Kim
Journal: Commun Biol Date: 2018-11-26

10. Parameter estimation in models of biological oscillators: an automated regularised estimation approach.

Authors: Jake Alan Pitt; Julio R Banga
Journal: BMC Bioinformatics Date: 2019-02-15 Impact factor: 3.169

2 in total

1. Identification of Type 2 Diabetes Based on a Ten-Gene Biomarker Prediction Model Constructed Using a Support Vector Machine Algorithm.

Authors: Jiabin Li; Jieying Ding; D U Zhi; Kaiyun Gu; Hui Wang
Journal: Biomed Res Int Date: 2022-03-04 Impact factor: 3.411

2. Systematic inference identifies a major source of heterogeneity in cell signaling dynamics: The rate-limiting step number.

Authors: Dae Wook Kim; Hyukpyo Hong; Jae Kyoung Kim
Journal: Sci Adv Date: 2022-03-18 Impact factor: 14.136

2 in total