Mahdi Shafiee Kamalabad1, Alexander Martin Heberle2, Kathrin Thedieck2,3, Marco Grzegorczyk1. 1. Department of Mathematics, Bernoulli Institute, Faculty of Science and Engineering, University of Groningen, AG Groningen, The Netherlands. 2. Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, AV Groningen, The Netherlands. 3. Department for Neuroscience, School of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany.
Abstract
MOTIVATION: Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular modelling tool for learning cellular networks from time series data. In systems biology, time series are often measured under different experimental conditions, and not rarely only some network interaction parameters depend on the condition while the other parameters stay constant across conditions. For this situation, we propose a new partially NH-DBN, based on Bayesian hierarchical regression models with partitioned design matrices. With regard to our main application to semi-quantitative (immunoblot) timecourse data from mammalian target of rapamycin complex 1 (mTORC1) signalling, we also propose a Gaussian process-based method to solve the problem of non-equidistant time series measurements. RESULTS: On synthetic network data and on yeast gene expression data the new model leads to improved network reconstruction accuracies. We then use the new model to reconstruct the topologies of the circadian clock network in Arabidopsis thaliana and the mTORC1 signalling pathway. The inferred network topologies show features that are consistent with the biological literature. AVAILABILITY AND IMPLEMENTATION: All datasets have been made available with earlier publications. Our Matlab code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular modelling tool for learning cellular networks from time series data. In systems biology, time series are often measured under different experimental conditions, and not rarely only some network interaction parameters depend on the condition while the other parameters stay constant across conditions. For this situation, we propose a new partially NH-DBN, based on Bayesian hierarchical regression models with partitioned design matrices. With regard to our main application to semi-quantitative (immunoblot) timecourse data from mammalian target of rapamycin complex 1 (mTORC1) signalling, we also propose a Gaussian process-based method to solve the problem of non-equidistant time series measurements. RESULTS: On synthetic network data and on yeast gene expression data the new model leads to improved network reconstruction accuracies. We then use the new model to reconstruct the topologies of the circadian clock network in Arabidopsis thaliana and the mTORC1 signalling pathway. The inferred network topologies show features that are consistent with the biological literature. AVAILABILITY AND IMPLEMENTATION: All datasets have been made available with earlier publications. Our Matlab code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Dynamic Bayesian networks (DBNs) have become a popular tool for learning the topologies of cellular regulatory networks from time series data. The classical (homogeneous) DBN models assume that the network parameters stay constant in time, so that the network structure is inferred along with one single set of network parameters (Friedman ). Many regulatory processes are non-stationary so that this homogeneity assumption is too restrictive. To allow for time-dependent parameters, many authors have proposed to combine DBNs with multiple changepoint processes (Ahmed and Xing, 2009; Dondelinger ; Grzegorczyk and Husmeier, 2013; Husmeier ; Lèbre ; Robinson and Hartemink, 2010) or with hidden Markov models (Grzegorczyk, 2016). In those models a multiple changepoint process (or a hidden Markov model) divides the temporal data into disjoint components with component-specific network parameters. The network structure, the data segmentation and the component-specific network parameters are inferred from the data. Those models are often referred to as non-homogeneous DBNs (NH-DBNs).In many real-world applications, in particular in systems biology, data are often collected under different experimental conditions. That is, instead of one single (long) time series that has to be segmented, there are K (short) time series. The data are then intrinsically divided into K unordered components, and there is no need for inferring the segmentation. In this situation, it is normally not clear a priori whether the network parameters stay constant across components (conditions) or whether they vary from component to component (with the conditions). Three biological systems that we will consider in this article are: Section 4.4: The parameters of a metabolism-related regulatory network in Saccharomyces cerevisiae can depend on the medium, in which yeast is cultured (glucose versus galactose). Section 4.5: The parameters of the circadian clock network in Arabidopsis thaliana can depend on the dark: light cycles, to which the plants were earlier exposed. Section 4.6: The parameters of the mammalian target of rapamycin complex 1 (mTORC1) protein signalling network can change in the presence of insulin. For more examples and a thorough discussion on the integration of single-cell data from multiple experimental conditions we refer to Geissen .If the parameters stay constant, all data can be merged and analysed with one single homogeneous DBN. If the parameters are component-specific, then the data should be analysed by a NH-DBN. The disadvantage of both approaches is that all parameters are assumed to be either constant (DBN) or component-specific (NH-DBN). In real-world applications there can be both types of parameters, so that both models are inappropriate. For example, if a variable Y is regulated by two other variables, , then the interaction can stay constant, while might be component-specific, e.g. for K = 2 and in terms of a regression model:A DBN ignores that β and γ are different. A NH-DBN has to infer the same parameter α two times separately. Both model misspecifications can increase the inference uncertainty, and are thus critical for sparse data.No tailor-made model for the situation in (1) has been proposed yet. To fill this gap, we propose a partially NH-DBN model, which infers the best trade-off between a DBN and a NH-DBN. Unlike all earlier proposed NH-DBNs, the new partially NH-DBN model operates on the individual interactions (network edges). For each interaction there is a parameter, and the model infers from the data whether the parameter is constant or component-specific. We implement the new model in a hierarchical Bayesian regression framework, since this model class reached the highest network reconstruction accuracy in the cross-method comparison by Aderhold . But we note that the underlying idea is generic and could also be implemented in other frameworks, e.g. via L1-regularized regression models (‘LASSO’).In Section 2.5 we propose a Gaussian process (GP)-based method to deal with the problem of non-equidistant measurements. The standard assumption for all NH-DBNs is that data are measured at equidistant time points. For applications where this assumption is not fulfilled, we propose to use a GP to predict the values at equidistant data points and to replace the non-equidistant values by predicted equidistant values. We will make use of the GP method when analysing the mTORC1 data in Section 4.6.
2 Materials and methods
DBNs and NH-DBNs are used to infer networks showing the regulatory interactions among variables . The interactions are subject to a time lag, so that there is no need for an acyclic network structure. Hence, dynamic network inference can be thought of as inferring the covariate sets for N independent regression models. In the ith model, Z is the response and the remaining variables at time point t − 1 are used as potential covariates for Z at time point t. The goal is to infer a covariate set for each Z, and the system of covariate sets describes a network; see Section 2.6 for details. As the same regression model is applied to each Z separately, we describe it using a general notation, where Y is the response and are the covariates.
2.1 Bayesian regression with partitioned design matrix
We consider a regression model with response Y and covariates . We assume that data were measured under K experimental conditions, which we refer to as K components. We further assume that the data for each component were measured at equidistant time points . Let and denote the values of Y and X at the tth time point of component k. In dynamic networks, the interactions are subject to a time lag , which is usually set to one time point. That is, the values correspond to the response value . For each component k we build a component-specific response vector and the corresponding design matrix , where includes a first column of 1’s for the intercept:
whereFor each k we could assume a separate Gaussian likelihood:
where is the identity matrix, is the component-specific vector of regression coefficients, and is the component-specific noise variance. Imposing independent priors on each pair , leads to K independent models. Alternatively, we could merge the data and and employ one model for the merged data:
so that would apply to all components.When some covariates have a component-specific and other covariates have a constant regression coefficient, both likelihoods (2) and (3) are suboptimal. For this situation, we propose a new partially non-homogeneous regression model that infers the best trade-off from the data. The key idea is to use a likelihood with a partitioned design matrix.For now, we assume that we know for each coefficient whether it is component-specific or constant. Let the intercept and the first coefficients stay constant while the remaining coefficients are component-specific. We then have the regression equation:
where , and the likelihood takes the form:
where is a vector of regression coefficients, and is a partitioned matrix with rows and columns. For example, for K = 2 the matrix has the structure:
where , and is of the form:The first subvector of is the vector of the regression coefficients that stay constant, and then there is a subvector for each component k with the component-specific regression coefficients. For the noise variance parameter we use an inverse Gamma prior, , and on we impose a Gaussian prior with zero mean vector:For the component-specific vectors we adapt the idea from Grzegorczyk and Husmeier (2013), and impose a hyperprior:The hyperprior couples the vectors hierarchically and encourages them to stay similar across components. Re-using the variance parameter in (5 and 6) allows the regression coefficient vectors and the noise variance to be integrated out in the likelihood, i.e. the marginal likelihood to be computed analytically (see below). For and we also use inverse Gamma priors:The prior of is a product of Gaussians:Given , and , the Gaussians are independent, so that:
where is the -dimensional and the -dimensional identity matrix. We have for the posterior distribution:A graphical model representation of the new regression model is provided in Figure 1. The full conditional distributions of and can be computed analytically, so that Gibbs-sampling can be applied to generate a posterior sample. As the derivations are mathematically involved, we relegate them to Part A of the Supplementary Material.
Fig. 1.
Graphical model representation of the regression model with partitioned design matrix. Variables that have to be inferred are in white circles. The data and the fixed hyperparameters are in grey circles. The vector deterministically depends on and . The vector in the plate is condition-specific
Graphical model representation of the regression model with partitioned design matrix. Variables that have to be inferred are in white circles. The data and the fixed hyperparameters are in grey circles. The vector deterministically depends on and . The vector in the plate is condition-specificThe marginalization rule from Section 2.3.7 of Bishop (2006) yields:
2.2 Inferring the relevant covariates and their types
In typical applications, there is a set of variables, and the subset of the relevant covariates has to be inferred from the data. Each covariate can be either constant (δ = 1) or component-specific (δ = 0). Let be a subset of the variables, and let be a vector of binary variables, where δi indicates whether X has a constant () or component-specific () regression coefficient. The first element, δ0, refers to the intercept.The goal is then to infer the covariate set and the corresponding indicator vector from the data. For any combination of and , the partitioned design matrix can be built, and the marginal likelihood can be computed with (8). We get for the posterior:
where is a Gaussian, whose dimension is the number of component-specific coefficients. For the covariate sets, , we follow Grzegorczyk and Husmeier (2013) and assume a uniform distribution, truncated to . The prior will be specified in Section 2.5. To generate samples from the posterior, we use a Markov Chain Monte Carlo (MCMC) algorithm, which combines the Gibbs-sampling steps for and with two blocked Metropolis Hastings (MHs) moves. In the first MH move the vector is sampled jointly with , and in the second MH move is sampled jointly with and . As the implementation of the MCMC algorithm is involved, we relegate the mathematical details to Parts B and C of the Supplementary Material.
2.3 Competing models
A homogeneous model merges all data, while a non-homogeneous model assumes each component k to have specific parameters; see (2). The new partially non-homogeneous model infers the best trade-off: Each regression coefficient can be either constant or component-specific.For a fair comparison, we also allow the non-homogeneous model to switch between a homogeneous and a non-homogeneous state. However, like all models that have been proposed so far, it operates on the covariate sets. All covariates have either component-specific (S = 0) or constant (S = 1) regression coefficients. In our method comparison, we include:DBN: A homogeneous model that merges all data, see (3).NH-DBN: The NH-DBN model switches between two states. We have a DBN for S = 1, and the likelihood takes the form of (2) for S = 0.coupled NH-DBN: This model from Grzegorczyk and Husmeier (2013) is an NH-DBN that globally couples the regression coefficients.
2.4 Specifying the covariate type prior
The NH-DBNs can switch between: ‘all covariates are constant’ (S = 1) versus ‘all covariates are component-specific’ (S = 0). Those states refer to and of the partially NH-DBN. To match the priors, we set:For contains n + 1 binary elements, which we assume to be independently Bernoulli distributed. To fulfil (9) the Bernoulli parameter must depend on . We get: and . From (9) we obtain:
For mixture models it is often assumed that the number of components has a Poisson distribution (Green, 1995). We truncate it to :
where is the density of the Poisson distribution with parameter θ = 1.
2.5 GP smoothing for non-equidistant data
The regression models assume that the time lag between the response value and the covariate values is the same for all t. If the data within a component k were measured at time points , with varying distances , the models lead to biased results. For this scenario, we propose to replace the observed non-equidistant response values by predicted equidistant response values. We propose the following GP-based method:Determine the lowest time lag , where .Given the observed data points , use a GP to predict the whole curve .Extract the response values at the time points: .Build the response vector and design matrix such that the values are used to explain the predicted response value (). The new lag is then constant; .A GP is a stochastic process , here indexed by time, such that every finite subset of the random variables has a Gaussian distribution. A GP can be used to estimate a non-linear curve from noisy observations. We here assume the relationship: where is observational noise, and the non-linear function is unknown. We estimate by fitting a GP to the observed data. The GP defines a distribution over the functions , which transforms the input ( into output (, such that
where is the identity matrix, and the elements of the T-by-T covariance matrix, , are defined through a kernel function: with signal variance parameter . The kernel function is typically chosen such that similar inputs t and t yield correlated variables and . A popular and widely used kernel is the squared exponential kernel with: where l is the length scale. For the unobserved vector we then have the predictive distribution:
with
where is the observed response vector, and and are T-by-T matrices, whose elements are given by: and . Before inferring the GP, we standardize to mean 0, and we impose log-uniform priors on the GP parameters ( and l). For predicting the unobserved response vector, we have to make two decisions:The GP parameters can either be sampled via MCMC simulations or their maximum a posteriori (MAP) estimates can be computed.Given GP parameters, the vector can be sampled from (11) or it can be set equal to its predictive expectation, , defined in (12).We have implemented and cross-compared all four combinations. For lack of space, we here report the results obtained for predictive expectations based on MAP estimates. A comparison of the four approaches can be found in Part D of the Supplementary Material.
2.6 Learning topologies of regulatory networks
Assume that the variables interact with each other in form of a network and that data were collected under K conditions and that the conditions influence some of the interactions. Let denote the N-by-T data matrix which was measured under condition k. The rows of correspond to the variables and the columns of correspond to T time points. denotes the value of Z at time point t under condition k.The goal is to infer the network structure. Interactions for temporal data are usually modelled with a time lag, e.g. of order . An edge, , indicates that Z has an effect on Z in the following sense: For all k the value (Z at t + 1) depends on (Z at t).There is no acyclicity constraint, and DBN inference can be thought of as inferring N separate regression models and combining the results. In the ith model is the response. The remaining variables are the potential covariates. For each we infer a covariate set , and the covariate sets describe a network . There is the edge in the network if and only if .We can thus apply the partially non-homogeneous model to each Y = Z separately, to generate posterior samples. We extract the covariate sets, (), and we merge them to a network sample . The rth network possesses the edge if and only if . For each edge we can then estimate its marginal posterior probability (‘score’):When the true network is known, we can evaluate the network reconstruction accuracy with precision-recall curves. For each we extract the edges whose scores exceed ψ, and we count the number of true positives among them. Plotting the precisions against the recalls , where M is the number of edges in the true network, gives the precision–recall curve (Davis and Goadrich, 2006). We refer to the area under the curve as AUC value. The higher the AUC, the higher the reconstruction accuracy.
3 Implementation
For the inverse Gamma distributed parameters () we use shape and rate parameters from earlier works, e.g. in Grzegorczyk and Husmeier (2013) and Lèbre : and and for the hyperprior on we use and . Other settings led to comparable results what indicates robustness w.r.t. those hyperparameters. To ensure a fair comparison we use the same hyperparameters for the competing models; cf. Section 2.3.For generating posterior samples, we run the MCMC algorithm from Section 2.2 for 100 000 (100k) iterations. We set the burn-in phase to 50k and we sample every 100th graph during the sampling phase. This yields R = 500 posterior samples for each response Y = Z. We merge the individual covariate sets (; ) to a network sample , as explained in Section 2.6. For each edge we then compute its edge score .We used scatter plots of edge scores from independent simulations to monitor convergence. In Section 4.3 we study convergence, scalability and the computational costs for model inference.We implement the GP method with the squared exponential kernel and used the Matlab package ‘GPstuff’ (Vanhatalo ) to numerically determine the MAP estimates of the parameters via scaled conjugate gradient optimization. We also tested other kernels, such as the Matern 3/2 and 5/2 kernel, and for them we obtained very similar results.
4 Empirical results
4.1 Pre-study 1: GP smoothing
Our first objective is to provide empirical evidence that the proposed GP method from Section 2.5 can yield substantial improvements. To this end, we generate values for 10 autoregressive (AR) variables:
where all ’s are independently N(0, 1) distributed, for all i, and . This yields: for all t and all i. We further assume that X1 and X2 are covariates for:
where .In a second scenario we replace (13) by moving averages (MA):
where all ’s are independently distributed, so that again for all i and t.We generate data for both scenarios (AR and MA) with different parameter settings in (14) and η in (13), respective q in (15). We thin the data out and keep only the observations at the time points , as the same time points were measured for the mTORC1 data; see Section 4.6. The standard regression approach uses the covariate values at t for explaining Y at , although the time lag steadily increases. The GP method from Section 2.5 predicts the response values at , and replaces (observed Y at ) by (predicted Y at ), where .With both approaches we run MCMC simulations on each dataset, and from the MCMC samples we compute for each covariate X the score that X is a covariate for Y. Our results show that the proposed GP method finds the true covariates X1 and X2, while the standard approach cannot clearly distinguish them from the irrelevant variables . Figure 2 shows histograms of the average covariate scores for AR data with and , and for MA data with and q = 10.
Fig. 2.
Average scores (posterior probabilities). In each histogram, the dark grey bars refer to the scores of the true covariates, and the light grey bars refer to the irrelevant variables. Covariate values were generated via AR (top) and MA processes (bottom). The left histograms show the scores of a standard regression (without GP processing). The right histograms show the scores when the proposed GP method is used. Error bars indicate SDs
Average scores (posterior probabilities). In each histogram, the dark grey bars refer to the scores of the true covariates, and the light grey bars refer to the irrelevant variables. Covariate values were generated via AR (top) and MA processes (bottom). The left histograms show the scores of a standard regression (without GP processing). The right histograms show the scores when the proposed GP method is used. Error bars indicate SDs
4.2 Pre-study 2: Synthetic RAF-pathway data
The RAF pathway, as reported in Sachs , consists of N = 11 nodes and 20 directed edges. The topology of the RAF pathway is shown in Part E of the Supplementary Material. We generate data with K = 2 components and T = 10 data points each. The parent nodes of each node Z build its covariate set . We assume a linear model with component-specific regression coefficients:
where denotes the value of node Z at time point t in component k, and is the regression coefficient for in component k. The noise values and the initial values are sampled from independent distributions. For Z there are component-specific regression coefficients. For each Zi we collect them in two vectors (k = 1, 2), and we sample the elements of from N(0, 1) Gaussian distributions. We then re-normalize the vectors to Euclidean norm one: (k = 1, 2). We distinguish six scenarios:(S1) Identical: We withdraw and assume that the same regression coefficients apply to both components. We set: for all i.(S2) Identical signs (correlated): We enforce the coefficients to have the same signs, i.e. we replace by: for all i and j.(S3) Uncorrelated: We use the vectors for component k (k = 1, 2). The component-specific coefficients and are then uncorrelated for all i and all j.(S4) Opposite signs (negatively correlated): We withdraw the vector and we set: . The coefficients and are then negatively correlated.Mixture of (S1) and (S3): We assume that 50% of the coefficients are identical for both k, while the other 50% are uncorrelated. We randomly select 50% of the coefficients and set: . The other 50% of the coefficients stay unchanged (uncorrelated).Mixture of (S1) and (S4): We withdraw and we assume that 50% of the coefficients are identical for both k, while the other 50% have an opposite sign. We randomly select 50% of the coefficients and set: . For the other coefficients we set .For each scenario we generate 25 datasets. We then analyse every dataset with each model. Figure 3 shows the average AUC values for reconstructing the RAF pathway. Only for scenario (S1), where all coefficients are constant, the models perform equally well. For (S2–S6) the homogeneous DBN is substantially worse than the NH-DBNs. The coupled NH-DBN is slightly superior to the (non-coupled) NH-DBN. The proposed partially NH-DBN yields the highest average AUC scores.
Fig. 3.
Network reconstruction accuracy for RAF pathway. The histograms show the average precision-recall AUC values. Each AUC is averaged across 25 datasets and the error bars indicate SDs. The bars refer to: the homogeneous DBN (white), the NH-DBN model (light-grey), the coupled NH-DBN (dark-grey) and the partially NH-DBN (black). For (S2–5) the AUC differences are significantly in favour of the new partially NH-DBN (two-sided paired t-test P-values < 0.05)
Network reconstruction accuracy for RAF pathway. The histograms show the average precision-recall AUC values. Each AUC is averaged across 25 datasets and the error bars indicate SDs. The bars refer to: the homogeneous DBN (white), the NH-DBN model (light-grey), the coupled NH-DBN (dark-grey) and the partially NH-DBN (black). For (S2–5) the AUC differences are significantly in favour of the new partially NH-DBN (two-sided paired t-test P-values < 0.05)
4.3 Pre-study 3: Scalability and computational costs
To study the scalability of the new network reconstruction method, we generate data for random network structures with nodes. For each node Z we first sample the number of parents x from a Poisson distribution with parameter λ = 1 (‘Poisson in-degree distribution’), before we randomly draw a parent set from a uniform distribution over the system of all parent sets with cardinality x, . Given the network structure, we generate data as described in Section 4.2, i.e. via regression relationships using K = 2 and T = 10. Here we present and discuss the results for the scenario: ‘Mixture of (S1) and (S3)’. For each N we generate 10 independent datasets, i.e. 40 in total. Next, we measure how many MCMC iterations W we can perform in 1 h. With our Matlab implementation on a desktop computer with Intel Xeon 2.5 GHz processor and 8GB of RAM, the average numbers of iterations per hour are: W = 208 637 (N = 10), W = 83 615 (N = 25), W = 41 666 (N = 50) and W = 19 855 (N = 100).To monitor convergence and network reconstruction accuracy in real-time, we perform long MCMC simulations. For each N we select the numbers of iterations such that the simulation takes 16 h. During the simulations we sample 200 equidistant networks per hour (i.e. 3200 networks in total). When withdrawing the first 50% of the networks (‘burn-in’), we have networks after h hours. We use those R networks to assess the performance after h hours of computational time. For running two independent 16 h long MCMC simulations on each of the 40 datasets (computational time: 1280 h), we use a computer cluster.To assess convergence, we consider scatter plots of the edge scores of two independent MCMC simulations on the same dataset. For the largest networks with N = 100, Figure 4a shows superimposed scatter plots for different computational times. It can be seen that after 2–4 h only few edges points deviate from the diagonal, i.e. only few edge scores differ between independent simulations. This is a good indication of convergence. The corresponding scatter plots for the smaller networks with can be found in Part F of the Supplementary Material. As expected, the Supplementary Figures show that the rate of convergence decreases with the size of the network N.
Fig. 4.
Convergence analysis. (a) For each of 10 datasets two independent MCMC simulations have been performed. The edge scores of the two simulations can be plotted against each other. Each panels refer to a computational time and show 10 superimposed scatter plots. (b) Both panels show curves of the average AUC value against the computational time. In the upper panel the network size N varies, while the no. of data points is kept fixed. In the lower panel for N = 100 the no. of data points is varied
Convergence analysis. (a) For each of 10 datasets two independent MCMC simulations have been performed. The edge scores of the two simulations can be plotted against each other. Each panels refer to a computational time and show 10 superimposed scatter plots. (b) Both panels show curves of the average AUC value against the computational time. In the upper panel the network size N varies, while the no. of data points is kept fixed. In the lower panel for N = 100 the no. of data points is variedThe upper panel of Figure 4b monitors the average precision-recall AUC scores along the computational time. The four AUC curves run into plateaus. Already after 1–2 h the AUCs (i.e. the average network reconstruction accuracy) does not improve anymore. The two curves for N = 10 and = 25 converge to AUC = 0.72, while the AUC limits are lower for N = 50 (AUC = 0.58) and N = 100 (AUC = 0.49). The individual AUC curves are shown in Part F of the Supplementary Material. Taking into account that a network among N = 100 nodes had to be inferred from 20 data points (K = 2 conditions with T = 10 data points each), the rather low network reconstruction accuracy (AUC = 0.49) is not surprising. To show that higher AUCs can be reached for networks with N = 100 nodes, we repeat the study with T = 20 and = 40. The bottom panel of Figure 4b monitors the average AUC scores for N = 100 and . Here, we had to adapt the numbers of MCMC iterations per hour to W = 12 707 (T = 20), and W = 9343 (. The new curves also run into plateaus and reach higher limits: AUC = 0.70 and = 0.83. The results of this section are compactly summarized in Table 1.
Table 1.
Summary of scalability results
Nodes N
10
25
50
100
100
100
Data points 2·Tk
20
20
20
20
40
80
Iterations per hour
208 637
83 615
41 666
19 855
12 707
9 343
AUC limit
0.72
0.72
0.58
0.49
0.70
0.83
Note: See Section 4.3 for further details.
Summary of scalability resultsNote: See Section 4.3 for further details.
4.4 Reconstructing the yeast gene network topology
By means of synthetic biology, Cantone designed a network with N = 5 genes in S.cerevisiae (yeast); Figure 5 shows the true network. With quantitative Real-Time Polymerase Chain Reaction, Cantone then measured in vivo gene expression data: under galactose- (k = 1) and glucose-metabolism (k = 2). measurements were taken in galactose and in glucose. The data have become a benchmark application, as the network reconstruction accuracies can be cross-compared on real in vivo gene expression data. Figure 5 shows the results, and again a clear trend can be seen: The homogeneous DBN yields the lowest AUC value. The NH-DBN model yields higher AUCs and can be further improved by coupling the regression coefficients (coupled NH-DBN). The proposed partially NH-DBN reaches the highest network reconstruction accuracy. The results are thus consistent with the results for the RAF-pathway data in Section 4.2.
Fig. 5.
Network reconstruction accuracy for yeast gene expression data. The histogram shows the average precision-recall AUC values, averaged across 25 MCMC simulations, with error bars indicating SDs. The AUCs are: 0.61 (DBN), 0.69 (NH-DBN), 0.81 (coupled NH-DBN) and 0.87 (new NH-DBN). All three AUC differences are significant in terms of two-sided t-tests (P < 10–3)
Network reconstruction accuracy for yeast gene expression data. The histogram shows the average precision-recall AUC values, averaged across 25 MCMC simulations, with error bars indicating SDs. The AUCs are: 0.61 (DBN), 0.69 (NH-DBN), 0.81 (coupled NH-DBN) and 0.87 (new NH-DBN). All three AUC differences are significant in terms of two-sided t-tests (P < 10–3)
4.5 The circadian clock network in A.thaliana
The circadian clock network in A.thaliana orchestrates the gene regulatory processes, related to the plant metabolism, with respect to the daily changing dark: light cycles of the solar day. The mechanism of internal time-keeping allows the plant to anticipate each new day, at the molecular level, and to optimize its growth. In K = 4 experiments Arabidopsis plants were entrained in different dark: light cycles, before the gene expressions of N = 9 circadian clock genes were measured under experimentally controlled constant light condition. The numbers of observed time points are and T = 13 for k = 2, 3, 4. For further details on the experimental protocols we refer to Grzegorczyk (2016). Figure 6a shows a network that was inferred with the new partially coupled model. For the prediction, we extracted the 21 edges with edge scores higher than the threshold . A proper biological evaluation of the network is hindered and beyond the scope of this article, as the true circadian clock network has not been fully discovered yet. However, our predicted network in Figure 6a contains many edges that are consistent with hypotheses from the plant biology literature. In particular, the high-scoring feedback loop LHY ↔TOC1 seems to be the most important key feature of the circadian clock network (see, e.g. Locke ). Moreover, it has been reported that LHY is a regulator of the genes ELF3 and ELF4 (Alabadi ; Kikis ). Also the edge LHY →CCA1 is not unexpected, as LHY and CCA1 are known to be biological homologues (Miwa ). Four more edges, for which we could find biological literature references, are: GI →TOC1 (Locke ), ELF3 →LHY (Kikis ), ELF3 →TOC1 (Dixon ) and ELF3 →PRR9 (Chow ).
Fig. 6.
Shown are the edges whose scores exceeded the threshold ; edges are labelled with their scores. Edges with scores higher than are in bold. (a) Inferred circadian clock network in Arabidopsis thaliana. (b) Inferred topology of the mTORC1 signalling pathway
Shown are the edges whose scores exceeded the threshold ; edges are labelled with their scores. Edges with scores higher than are in bold. (a) Inferred circadian clock network in Arabidopsis thaliana. (b) Inferred topology of the mTORC1 signalling pathway
4.6 The mTORC1 network
The mammalian target of rapamycin complex 1 (mTORC1) is a serine/threonine kinase which is evolutionary conserved and essential in all eukaryotes (Saxton and Sabatini, 2017). mTORC1 is at the centre of a multiply wired, complex signalling network, whose topology is well studied and contains several well-characterized feedback loops (Saxton and Sabatini, 2017). Hence, we used the mTORC1 network as a surrogate based on which we can objectively evaluate the predictive power of our partially NH-DBN model for learning network structures. The signalling network converging on mTORC1 is built by kinases, which inactivate or activate each other by phosphorylation. Thus, a protein can be phosphorylated at one or several sites, and the phosphorylations at these positions determine its activity. Signalling through the mTORC1 network is elicited by external signals like insulin or amino acids. Dalle Pezze relatively quantified 11 phosphorylation states of 8 key proteins across the mTORC1 signalling network by immunoblotting; for an overview see Table 2. Dynamic time course data were obtained under two experimental conditions, namely upon stimulation with amino acids only (k = 1), and with amino acids plus insulin (k = 2). The phosphorylation states were measured at T = 10 time points: minutes, so that the time lag increases from 1 to 60. We therefore apply the GP method from Section 2.5 to predict equidistant response values, before analysing the data with the proposed partially NH-DBN. The 12 edges with scores higher than yield the network topology shown in Figure 6b. A literature review shows that 11 out of the 12 edges have been reported earlier.
Table 2.
mTORC1 timecourse data
Protein
Full name
Sites
mTOR
mammalian target of rapamycin
pS2481, pS2448
PRAS40
proline-rich AKT/PKB substrate 40 kDa
pT246, pS183
AKT
Protein kinase B
pT308, pS473
IRS1
insulin receptor substrate 1
pS636
IR-beta
insulin receptor beta
pY1146
AMPK
AMP-dependent protein kinase
pT172
TSC2
tuberous sclerosis 2 protein
pS1387
p70-S6K
Ribosomal protein S6 kinase beta-1
pT389
Note: Overview to the 8 proteins and the 11 measured phosphorylation sites.
mTORC1 timecourse dataNote: Overview to the 8 proteins and the 11 measured phosphorylation sites.We focus first on the five interactions with the highest scores . Two out of these five interactions are enzyme-substrate relationships: p70-S6K is a kinase which is directly activated by mTORC1 through phosphorylation at threonine 389 [p70-S6K-pT389] (Saxton and Sabatini, 2017). Thus, p70-S6K-pT389 represents a direct readout of mTORC1 activity. p70-S6K phosphorylates IRS1 at serine 636, [IRS1-pS636] (Tzatsos and Kandor, 2006) and mTOR at serine 2448 [mTOR-pS2448] (Dibble and Cantley, 2015), and both edges are correctly identified by our model [p70-S6K-pT389→IRS1-pS636, p70-S6K-pT389→mTOR-pS2448]. Two other interactions with a high score are between AKT-pT308 ↔AKT-pS473. The two phosphorylations are predicted by our model to influence each other, and a positive feedback between phosphorylation events on S473 and T308 of AKT has indeed been demonstrated biochemically (Manning and Toker, 2017). Another high score prediction is between IRS1-pS636 and p70-S6K-pT389 [IRS1-pS636 →p70-S6K-pT389]. Phosphorylation at S636 inhibits IRS1, thereby leading to inhibition of mTORC1 and its substrate p70-S6K-T389 (Tzatsos and Kandor, 2006). Thus, the negative feedback between IRS1-pS639 and p70-S6K-pT389 explains the learned edge between them [IRS1-pS636→p70-S6K-pT389]. In addition, IRS1 inhibition by phosphorylation at S636 results in reduced phosphorylation of AKT at threonine 308, which is in agreement with the learned edge between IRS1-pS636 and AKT-pT308 [IRS1-pS636→AKT-pT308].We could also find evidence for 6 of the remaining 7 edges with scores in between 0.5 and 0.8. PRAS40 is an endogenous mTORC1 inhibitor (Saxton and Sabatini, 2017). The edge from PRAS40-pT246 to PRAS40-pS183 corresponds to a well-described mechanism of PRAS40 regulation: AKT phosphorylates PRAS40 at T246 [PRAS40-pT246], which allows subsequent phosphorylation of PRAS40-S183 by mTORC1 (Nascimento ). This interaction is accurately resembled by our model [PRAS40-pT246→ PRAS40-pS183]. PRAS40’s double phosphorylation dissociates PRAS40 from mTORC1, leading to its derepression (Nascimento ). This mechanism is resembled by the edge between PRAS40-S183 and mTOR-S2481 [PRAS40-pS183→mTOR-pS2481], the latter being an autophosphorylation site which directly monitors mTOR activity (Soliman ). Furthermore, the model suggests an edge between p70-S6K-pT389 and PRAS40-pS183 [p70-S6K-pT389 →PRAS40-pS183]. Both are mTORC1 substrate sites (Nascimento ; Saxton and Sabatini, 2017) and are therefore often targeted in parallel. The only predicted edge for which there is to the best of our knowledge no literature evidence is between mTOR-pS2448 and TSC2-pS1387 [mTOR-pS2448 →TSC2-pS1387]. TSC2 is activated by phosphorylation at S1387 and inhibits mTORC1 (Hindupur ). Our model prediction that mTORC1—when phosphorylated at S2448 by p70-S6K—regulates TSC2 remains to be experimentally tested.After having identified 11 of 12 edges as true positives, we performed a literature review to find false negative edges, i.e. edges that our model did not extract, although it has been reported that they exist. This way, we could identify two false negative edges, namely: IR-beta-pY1146→AKT-pT308 and AMPK-pT172→TSC2-pS1387, which were reported in Vigneri and Mihaylova and Shaw (2012), respectively. Finally, we note that this does not imply that the remaining edges that our model did not extract can be assumed to be true negatives. Nowadays incomplete knowledge about the topology of the mTORC1 pathway renders the safe identification of true negative edges impossible. The absence of literature reports on an edge does not necessarily imply that it does not exist.
5 Conclusion and discussion
We propose a new partially NH-DBN model for learning network structures. When data are measured under different experimental conditions, it is rarely clear whether the data can be merged and analysed within one single model, or whether there is need for a NH-DBN model that allows the network parameters to depend on the condition. The new partially NH-DBN has been designed such that it can infer the best trade-off from the data. It infers for each individual edge whether the corresponding interaction parameter is constant or condition-specific. Our applications to synthetic RAF pathway data as well as to yeast gene-expression data have shown that the partially NH-DBN model improves the network reconstruction accuracy. We have used the partially NH-DBN model to predict the structure of the mTORC1 signalling network. As the measured mTORC1 data are non-equidistant, we have applied a GP-based method to predict the missing equidistant values. Results on synthetic data (see Section 4.1) show that the proposed GP-method (see Section 2.5) can lead to substantially improved results.All but one of the predicted interactions across the mTORC1 network are reflected in experiments reported in the biological literature. Dalle Pezze built an ODE-based dynamic model which allows to predict signalling responses to perturbations. Like for many ODE-based models, the topology of this model was defined by the authors, based on literature-knowledge. The ODE model simulations could reproduce the measured mTORC1 timecourse data. Interestingly, all the connections predicted by our new partially NH-DBN model form part of the core model by Dalle Pezze . Hence, we present an alternative unsupervised learning approach, in which the topology of signalling networks is inferred directly from the data. The new model is thus a complementary tool that enhances dynamic model building by predicting the network’s topology in a purely data-driven manner.Although it worked well for the mTORC1 data, we note that the GP method from Section 2.5 requires the time series to be sufficiently smooth. For non-smooth time series the method might not be able to properly predict the values at unobserved time points, leading to biased response values. Then, network reconstruction methods, such as the new partially NH-DBN, will inevitably infer distorted network topologies and wrong conclusions might be drawn. The assumption of smoothness is therefore crucial for the complete analysis pipeline to work. Our results in Section 4.3 show that the new partially NH-DBN can also be used to infer larger networks. However, our results suggest that there is then need for more data points (or higher signal-to-noise ratios, respectively) to reach accurate network predictions. A conceptual advantage of our partially NH-DBN is that it has two established models, namely the homogeneous DBN () and the globally coupled NH-DBN () as limiting cases. The new model operates between them, and as we follow a model averaging approach, it is less susceptible to over-fitting. The edge scores of the partially NH-DBN take the two established models as well as all ‘in-between’ models into account. For sparse data, we would thus expect low edge scores, indicating that we might average across too many models.
Funding
M.S.K. and M.G. are supported by the European Cooperation in Science and Technology (COST) [COST Action CA15109 European Cooperation for Statistics of Network Data Science (COSTNET)]. K.T. acknowledges support from the BMBF e: Med initiatives GlioPATH [01ZX1402B] and MAPTor-NET (031A426B), the German Research Foundation [TH 1358/3-1], the Stichting TSC Fonds (calls 2015 and 2017) and the MESI-STRAT project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under [grant agreement No 754688]. K.T. is recipient of a Rosalind-Franklin-Fellowship of the University of Groningen and of the Research Award 2017 of the German Tuberous Sclerosis Foundation.Conflict of Interest: none declared.Click here for additional data file.
Authors: Irene Cantone; Lucia Marucci; Francesco Iorio; Maria Aurelia Ricci; Vincenzo Belcastro; Mukesh Bansal; Stefania Santini; Mario di Bernardo; Diego di Bernardo; Maria Pia Cosma Journal: Cell Date: 2009-03-26 Impact factor: 41.582
Authors: James C W Locke; Megan M Southern; László Kozma-Bognár; Victoria Hibberd; Paul E Brown; Matthew S Turner; Andrew J Millar Journal: Mol Syst Biol Date: 2005-06-28 Impact factor: 11.429
Authors: James C W Locke; László Kozma-Bognár; Peter D Gould; Balázs Fehér; Eva Kevei; Ferenc Nagy; Matthew S Turner; Anthony Hall; Andrew J Millar Journal: Mol Syst Biol Date: 2006-11-14 Impact factor: 11.429
Authors: Alexander Martin Heberle; Ulrike Rehbein; Maria Rodríguez Peiris; Kathrin Thedieck Journal: Biochem Soc Trans Date: 2021-02-26 Impact factor: 5.407