Zara Ghodsi1, Xu Huang2, Hossein Hassani3. 1. Statistical Research Centre, Bournemouth University, 89 Holdenhurst Road, Bournemouth BH8 8EB, UK; Translational Genetics Group, Bournemouth University, Fern Barrow, Poole BH125BB, UK. 2. Statistical Research Centre, Bournemouth University, 89 Holdenhurst Road, Bournemouth BH8 8EB, UK. 3. Institute for International Energy Studies (IIES), Tehran 1967743 711, Iran.
Abstract
In developmental studies, inferring regulatory interactions of segmentation genetic network play a vital role in unveiling the mechanism of pattern formation. As such, there exists an opportune demand for theoretical developments and new mathematical models which can result in a more accurate illustration of this genetic network. Accordingly, this paper seeks to extract the meaningful regulatory role of the maternal effect genes using a variety of causality detection techniques and to explore whether these methods can suggest a new analytical view to the gene regulatory networks. We evaluate the use of three different powerful and widely-used models representing time and frequency domain Granger causality and convergent cross mapping technique with the results being thoroughly evaluated for statistical significance. Our findings show that the regulatory role of maternal effect genes is detectable in different time classes and thereby the method is applicable to infer the possible regulatory interactions present among the other genes of this network.
In developmental studies, inferring regulatory interactions of segmentation genetic network play a vital role in unveiling the mechanism of pattern formation. As such, there exists an opportune demand for theoretical developments and new mathematical models which can result in a more accurate illustration of this genetic network. Accordingly, this paper seeks to extract the meaningful regulatory role of the maternal effect genes using a variety of causality detection techniques and to explore whether these methods can suggest a new analytical view to the gene regulatory networks. We evaluate the use of three different powerful and widely-used models representing time and frequency domain Granger causality and convergent cross mapping technique with the results being thoroughly evaluated for statistical significance. Our findings show that the regulatory role of maternal effect genes is detectable in different time classes and thereby the method is applicable to infer the possible regulatory interactions present among the other genes of this network.
Entities:
Keywords:
Bicoid; Caudal; Convergent Cross Mapping; Drosophila melanogaster; Segmentation; Time and frequency domain causality
Segmentation in Drosophila melanogaster is a particularly well studied process which highlights the role of gene regulatory networks (GRNs) in the earliest stage of development [1]. In segmentation GRN, there are three fundamental types of genes which play a crucial role in Drosophila development: maternal effect genes, gap genes and pair rule genes [2]. Among them, the maternal effect genes including bicoid (bcd)1 and caudal (cad) must be addressed as the most important factors since they respectively determine most aspects of anterior and posterior axis of an adult fruit fly and more importantly, they commence the sequential activation of segmentation GRN [2], [3], [4].The segmentation GRN is perhaps the best-studied transcriptional network in Drosophila development. Therefore, there are considerable attempts to portrait a picture of the interactions presented between regulators in this GRN. Quantitatively, it is common to model GRNs using ordinary differential equations (ODEs) or stochastic ODEs [5], [6]. Even though, the substantial progress which has been made in modeling transcriptional regulations using these models in recent years is not deniable, the enormous number of regulatory functions obtained by these models and the estimation of parameters which are difficult to assess experimentally can still be considered as two major drawbacks of these methods [7], [8]. Recently, the availability of more data on molecular mechanisms of regulatory interactions has made it possible to study these interactions in more quantitative depth. However, to the best of our knowledge, there is not a particular study which evaluates the dynamic interactions of this system from a statistical causality point of view [9], [10], [11]. Hence, this paper seeks to consider an alternative approach based on various causality detection methods to evaluate the possibility of ratifying the validity and reliability of genetic inferences derived from experimental evidences by using proper analytical tools. It is of note that the detected regulatory link can be either inductive (i.e. increasing the protein concentration of one gene raises the protein concentration of the other gene), or inhibitory (i.e. increasing the protein concentration of one gene decreases the protein concentration of the other gene). Any efforts at identifying the nature of the detected interaction would require more extensive research and that objective is beyond the mandate of this paper [12].The analytical methods used in this paper consist of time and frequency domain Granger causality detection (GC) [13] approaches and an advanced non-parametric method - Convergent Cross Mapping (CCM) [14]. Time domain causality test [15] and its developed versions are the most common and generally accepted methods in causal inference analysis. Frequency domain causality test is the extension of time domain causality test on identifying causality for each individual frequency component instead of computing a single measure for the entire causal association. CCM is an advanced non-parametric method that is designed for a dynamical system involving complex interactions. The fundamental concept of CCM is that the information of the driver variable can be recovered from the predator variable, but not vice versa.It is imperative to note that since providing robust genetic evidence is an important step in reporting genetic causality, among all the interactions between regulators in segmentation GRN, we have narrowed down this study to the interactions between bcd and cad, bcd and Kruppel (kr) and cad and kr genes which their interactions have been previously accredited via laboratory experimental evidences. Accordingly, extracting these links using mentioned causality detection techniques will give us the credit to step further and apply these methods to find the unknown regulatory links between other genes.The regulatory role of bcd has been unveiled by several studies [3], [16]. According to Baird-Titus et al. [17] Bcd is one of few proteins which binds both RNA and DNA targets and can be involved in both transcriptional and post transcriptional regulation. Bcd enhances the transcription of anterior gap genes such as kr and represses the translation of cad in the anterior region of the embryo [16]. In 2002, through an experimental approach, Niessing et al. showed that the translational repression of cad mRNA by Bcd depends on a functional eIF4E-binding motif [18]. The cad and kr genes are also required for a normal segmentation of the embryo. As noted in [19], the interaction of cad and kr gene is an important input of the segmentation genetic network.In applying causality detection techniques, it should also be noted that as it has been previously shown by several studies, these methods are sensitive to noise [20], [21], [22] and gene expression profiles are exceedingly noisy [23]. As it has been shown in Fig. 1, the profile achieved by fluorescence antibodies technique is highly volatile and in such cases, establishing a cause-and-effect relationship is more challenging and demands applying a noise filtering step prior to causation studies. In order to overcome these issues, among several noise filtering techniques, we have applied Singular Spectrum Analysis (SSA) which is a powerful method and has recently transformed itself into a valuable tool for gene expression signal extraction (see, for example, [24], [25], [26], [27]).
Fig. 1
A typical example of noisy Bcd, Cad and Kr for embryo ms26 at time class 14(1). Black, blue and green colours depict Bcd, Cad and Kr profiles respectively. The x-axis shows the position of the nuclei along the Anterior-Posterior (A-P) axis of the embryo and Y-axis shows the fluorescence intensity levels.
The remainder of this paper is organised such that Section 2 describes the analytical methods used in this study which is followed by description of the data in Section 3. Section 4 summarises the empirical results and the paper concludes with a concise summary in Section 5.
Causality detection and noise filtering techniques
Time domain Granger causality
Granger causality test [15] is the most generally accepted and significant method for causality analyses in various disciplines. Various applications and developments of this technique, also more specifically in the biomedical area, can be found in [28], [29], [30], [31], [32], [33], [34], [35], [36]. The regression formulation of Granger causality states that vector X is the cause of vector Y if the past values of X are helpful in predicting the future value of Y, two regressions are considered as follows: where i = 1,2,⋯ ,N (N is the number of observations), T is the maximal time lag, α and β are vectors of coefficients, and ε is the error term. The first regression is the model that predicts Y by using the history of Y only, while the second regression represents the model of Y and is predicted by the past information of both X and Y. Therefore, the conclusion of existing causality is conducted if the second model is a significantly better model than the first one.
Frequency domain causality
The frequency domain causality test is the extension of time domain GC test that identifies the causality between different variables for each frequency. In order to briefly introduce the testing methodology, we mainly follow [13], [37]. More details can be found in [38].It is assumed that two dimensional vector containing X and Y (where i = 1,2,⋯ ,N and N is the number of observations) with a finite-order Vector Auto-regression Model (VAR) representative of order p, where Θ(R) = I − Θ1R −… − ΘR is a 2 × 2 lag polynomial and Θ1,…,Θ are 2 × 2 autoregressive parameter matrices, with RX = X and RY = Y. The error vector is white noise with zero mean, and , where Z is positive definite matrix. The moving average (MA) representative of the system is with Ψ(R) = Θ(R) −1G −1 and G is the lower triangular matrix of the Cholesky decomposition G′G =Z −1, such that and . The causality test developed in [13] can be written as However, according to this framework, no Granger causality from X to Y at frequency γ corresponds to the condition |Ψ12(e −)| = 0, this condition leads to where Θ is the (1,2)th element of Θ, such that a sufficient set of conditions for no causality is given by [38]Hence, the null hypothesis of no Granger causality at frequency γ can be tested by using a standard F-test for the linear restrictions (7), which follows an F(2,B − 2p) distribution, for every γ between 0 and π, with B begin the number of observations in the series.
Convergent Cross Mapping (CCM)
Convergent Cross Mapping (CCM) is firstly introduced in [14] that aimed at detecting the causation among time series and provides a better understanding of the dynamical systems that have not been covered by other well established methods like Granger causality. CCM has proven to be an advance non-parametric technique for distinguishing causations in a dynamic system that contains complex interactions in biological studies and ecosystems, more details can be found in [14], [39], [40], [41]. CCM is briefly introduced below by mainly following [14].Assume there are two variables X and Y, for which X has a causal effect on Y. CCM test will test the causation by evaluating whether the historical record of Y can be used to get reliable estimates of X. Given a library set of n points (not necessarily to be the total number of observations N of two variables) and here set i = 1,2,⋯ ,n, the lagged coordinates are adopted to generate an E-dimensional embedding state space [42], [43], in which the points are the library vector X and prediction vector Y
The E + 1 neighbors of Y from the library set X will be selected, which actually form the smallest simplex that contains Y as an interior point. Accordingly, the forecast is then conducted by this process, which is the nearest-neighbor forecasting algorithm of simplex projection [43]. The optimal E will be evaluated and selected based on the forward performances of these nearby points in an embedding state space.Therefore, by adopting the essential concept of Empirical Dynamic Modeling (EDM) and generalized Takens' Theorem [42], two manifolds are conducted based on the lagged coordinates of the two variables under evaluation, which are the attractor manifold M constructed by Y and respectively, the manifold M by X. The causation will then be identified accordingly if the nearby points on M can be employed for reconstructing observed X. Note that the correlation coefficient ρ is used for the estimates of cross map skill due to its widely acceptance and understanding, additionally, leave-one-out cross-validation is considered a more conservative method and adopted for all evaluations in CCM.
Singular Spectrum Analysis
SSA is a powerful non-parametric method and has been previously applied for signal extraction of gene expression profiles [24], [25], [26], [27]. The basic SSA method consists of two complementary stages: decomposition and reconstruction [44]. Throughout the first stage, the gene expression profile is decomposed allowing to differentiate between signal and noise. Throughout the second stage, the less noisy series is reconstructed [45]. A short description of the SSA technique is given below, for more detailed information, see for example, [44], [46].Embedding. Here, the one-dimensional time series Y = (y1,…,y) is transferred into the multi-dimensional series X1,…,X with vectors X = (y,…,y) ∈R, where L(2 ≤ L ≤ N − 1) is the window length and K = N − L + 1. The result of this step is the trajectory matrix .SVD. Here, we perform the SVD of X. Denoted by λ1,…,λ the eigenvalues of XX arranged in the decreasing order (λ1 ≥… ≥ λ ≥ 0) and by U1…U the corresponding eigenvectors. The SVD of X can be written as X =X1 + … +X, where .Grouping. The grouping consists in splitting the elementary matrices into several groups and summing the matrices within each group.Diagonal averaging. The purpose of diagonal averaging is to transform a matrix to the form of a Hankel matrix, which can be subsequently converted to a time series.
Data
The quantitative bcd, cad and kr gene expression profiles representing the protein concentrations of these genes in wild-type Drosophila embryos are achieved using the confocal scanning microscopy of fixed embryos immunostained for segmentation proteins and is available via FlyEx database (http://urchin.spbcas.ru/flyex/). The applied antibody allows the visualisation of the proteins under study. Such quantification relies on the assumption that the actual protein concentrations detected by the antibodies and the fluorescence intensities are linearly related to the embryo's natural protein concentration [47], [48].To this aim, a 1024 × 1024 pixel confocal image with 8 bits of fluorescence data was obtained for each embryo which then transformed into an ASCII table. The ASCII table contains the fluorescence intensity levels attributed to each nucleus in the 10% of longitudinal strips (i.e. only the nuclei correspondents to the central 10% strip consists of the 45–55 % of the dorsoventral (D–V) axis are selected) along the A-P axis and is unprocessed for any noise reduction methods. Fig. 2 shows an example of a confocal image with the 10% longitudinal strip.
Fig. 2
Confocal image of an embryo at time class 14(1). White horizontal lines depict the 10% strip utilised to collect data.
Since the segment determination starts from cleavage cycle 10 and lasts until the end of cleavage cycle 14 A (when proteins synthesised from maternal transcripts begin to appear up to the onset of gastrulation) the data has been categorised to five main cycles of 10 to 14 A. Additionally, as the cleavage cycle 14 A is considerably longer in time, to facilitate the analysis, temporal classes 1 to 8 have been considered as the subgroups of this cleavage cycle [47], [48]. It should also be noted that each class of data contains a different number of embryos.Table 1 presents the number of embryos studied per each time class. It is of note that the expression profile of each embryo has a different length of data where the third column in this table reports the average.
Table 1
Different time classes and the embryos studied per each time class.
Time class
N
Length
SD
10
5
127
18.83
11
12
276
25.83
12
15
489
97.18
13
47
1224
78.56
14(1)
28
2318
143.87
14(2)
15
2315
86.83
14(3)
20
2367
141.05
14(4)
17
2309
119.16
14(5)
14
2301
126.96
14(6)
18
2347
103.74
14(7)
13
2007
229.61
14(8)
12
1600
311.21
Note: N=Number of embryos studied per each time class, Length=The average length of data of expression profiles, SD=Standard deviation of length of data.
Although confocal scanning microscopy is a generally employed technique for measuring the gene expression profiles, its use in systems biology studies presents a number of challenges such as the considerable amount of noise entering data after quantifying the fluorescence intensity. Possible errors in instrument functionality, sample preparation and mathematical treatment of data have been considered as the most common sources of noise [50]. In order to improve the mathematical treatment of data cleaning stage and extracting the signal from the original noisy data, we have applied SSA. Fig. 3 illustrates the output from this effort. It is evident that the SSA method provides a relatively smooth signal line with correlation below 0.10 which credits the satisfactory level of separation between noise and signal using SSA [27].
Fig. 3
A typical example of noisy Bcd, Cad and Kr along with the extracted signals in red for embryo ms26 at time class 14(1). Black, blue and green colours depict Bcd, Cad and Kr profiles respectively. The x-axis shows the position of the nuclei along the Anterior-Posterior (A-P) axis of the embryo and Y-axis shows the fluorescence intensity level.
Empirical results
This section provides a summary of the results following applying the three causality detection approaches before and after filtering the expression profiles using SSA. For all evaluations, we have ensured that all the test requirements are satisfied by choosing the optimal indices. Table 2 illustrates the findings of the causality detection analysis on Bcd and Cad profiles, where “YES” stands for the detected regulatory relationship by the adopted test. The p-values reported for time domain GC test are the average p-values attained for each time class. For time domain GC test, the co-integration test is conducted only for those variables having one unit root. Since none of the tested groups showed significant results in indicating co-integration, the co-integration test result is not reported here. The optimal lag for each VAR model is selected by comparing the information criteria matrix, which includes results based on the AIC [51], HQ [52], SIC [53] and FPE [54] criteria.
Table 2
A summary of the causality tests results for Bcd on Cad profiles.
Time class
Time domain GC
Frequency domain GC
CCM
Noisy series
Filtered series
Noisy series
Filtered series
Noisy series
Filtered series
YES/NO
p-Value
YES/NO
p-Value
YES/NO
YES/NO
YES/NO
YES/NO
10
NO
0.68
NO
0.45
NO
YES
YES
YES
11
NO
0.71
NO
0.33
NO
YES
YES
YES
12
NO
0.89
NO
0.32
NO
YES
YES
YES
13
NO
0.89
NO
0.24
NO
YES
YES
YES
14(1)
NO
0.95
YES
0.05
NO
YES
YES
YES
14(2)
NO
0.98
YES
0.04
NO
YES
YES
YES
14(3)
NO
0.98
YES
0.01
NO
YES
YES
YES
14(4)
NO
0.94
YES
0.01
NO
YES
YES
YES
14(5)
NO
0.95
YES
0.00
NO
YES
YES
YES
14(6)
NO
0.96
YES
0.00
NO
YES
YES
YES
14(7)
NO
0.81
YES
0.00
NO
YES
YES
YES
14(8)
NO
0.79
YES
0.04
NO
YES
YES
YES
Note: Differentiations are taken accordingly for stationarity prior to the tests; Optimal lag lengths are chosen based on the AIC, HQ, SIC and FPE criterions. “YES” stands for the detected regulatory link and “NO” means the regulatory link could not be detected by the adopted test.
According to Table 2, it is evident that there is a significant difference in results before and after reducing the noise from the profiles. The regulatory link between Bcd and Cad can be detected by neither time domain nor frequency domain tests in the presence of noise. Accordingly, it is clear that the filtering capability displayed by SSA is indeed advantageous for causality detection analysis.Nevertheless, as can be seen, the feasibility of capturing the regulatory link for CCM method has not been affected by noise and the results achieved by this test confirm the regulatory relationship between Bcd and Cad in expression profiles with and without noise. However, regardless of the time class, the index representing the ability of cross mapping is relatively smaller on average for noisy series than filtered series.It is of note that the length of the data under study varies between different time classes. Time classes 10 to 13 and 14(7–8) have shorter lengths comparing to the time classes 14(1–6), which may be the reason of getting slightly smaller p-values for time classes 11 to 13 and 14(8) comparing to the rest of the sub classes of time class 14. Yet, the frequency domain test shows less sensitivity to the data length possibly because this method identifies the possible regulative link for each individual frequency component rather than the entire series.Furthermore, the p-values obtained for both noisy and filtered data of all the embryos in different time classes are summarised in Fig. 4, Fig. 5 as box and whisker diagram respectively. They follow the standard format of box plot on displaying the distribution of the p-values based on maximum, upper quartile, median, lower quartile, and minimum. A close look at Fig. 4, Fig. 5 suggests that the time domain GC test cannot detect any regulatory link in the presence of noise, while the results for filtered series are significant and more consistent especially for those time classes after 14(1). Comparing the p-values illustrated in Fig. 4, Fig. 5, it is evident that the length of the series and level of intensities have more effect on the result of the noisy data than the filtered one as the p-values in Fig. 4 are getting more insignificant for the final subclasses of time class 14, where there is a decreasing pattern for these two parameters in the expression profiles. Likewise, for the frequency domain GC test, the links have been detected for all the filtered series, while there is no regulatory relationship detected for non-filtered ones.
Fig. 4
Box plots of time domain GC test p-values for noisy series. (Circle refers to the corresponding outlier that is more/less than 1.5 times of upper/lower quartile; the central rectangle spans the upper quartile to the lower quartile; the segment inside the rectangle indicates the median; whiskers above and below the box refer to the maximum and minimum.)
Fig. 5
Box plots of time domain GC test p-values for filtered series. (Circle refers to the corresponding outlier that is more/less than 1.5 times of upper/lower quartile; the central rectangle spans the upper quartile to the lower quartile; the segment inside the rectangle indicates the median; whiskers above and below the box refer to the maximum and minimum.)
Table 3, Table 4 present the results of the conducted analysis to detect the regulatory link between Bcd and Kr profiles and Cad and kr profiles respectively. As can be seen, reducing the noise level is an essential step in detecting the regulatory link using the time domain and frequency domain tests. Similar to the results reported in Table 2, CCM method can again efficiently identify the regulatory relationship even in the presence of noise.
Table 3
A summary of the causality tests results for Bcd on Kr profiles.
Time class
Time domain GC
Frequency domain GC
CCM
Noisy series
Filtered series
Noisy series
Filtered series
Noisy series
Filtered series
YES/NO
p-Value
YES/NO
p-Value
YES/NO
YES/NO
YES/NO
YES/NO
12
NO
0.71
NO
0.15
NO
YES
YES
YES
13
NO
0.66
YES
0.04
NO
YES
YES
YES
14(1)
NO
0.89
YES
0.03
NO
YES
YES
YES
14(2)
NO
0.93
YES
0.01
NO
YES
YES
YES
14(3)
NO
0.97
YES
0.01
NO
YES
YES
YES
14(4)
NO
0.94
YES
0.00
NO
YES
YES
YES
14(5)
NO
0.95
YES
0.00
NO
YES
YES
YES
14(6)
NO
0.92
YES
0.00
NO
YES
YES
YES
14(7)
NO
0.81
YES
0.00
NO
YES
YES
YES
Note: Differentiations are taken accordingly for stationarity prior to the tests; Optimal lag lengths are chosen based on the AIC, HQ, SIC and FPE criterions. “YES” stands for the detected regulatory link and “NO” means the regulatory link could not be detected by the adopted test.
Table 4
A summary of the causality tests results for Cad on Kr profiles.
Time class
Time domain GC
Frequency domain GC
CCM
Noisy series
Filtered series
Noisy series
Filtered series
Noisy series
Filtered series
YES/NO
p-Value
YES/NO
p-Value
YES/NO
YES/NO
YES/NO
YES/NO
12
NO
0.39
NO
0.25
NO
YES
YES
YES
13
NO
0.78
NO
0.11
NO
YES
YES
YES
14(1)
NO
0.84
YES
0.05
NO
YES
YES
YES
14(2)
NO
0.89
YES
0.03
NO
YES
YES
YES
14(3)
NO
0.94
YES
0.01
NO
YES
YES
YES
14(4)
NO
0.91
YES
0.01
NO
YES
YES
YES
14(5)
NO
0.87
YES
0.00
NO
YES
YES
YES
14(6)
NO
0.82
YES
0.00
NO
YES
YES
YES
14(7)
NO
0.75
YES
0.00
NO
YES
YES
YES
Note: Differentiations are taken accordingly for stationarity prior to the tests; Optimal lag lengths are chosen based on the AIC, HQ, SIC and FPE criterions. “YES” stands for the detected regulatory link and “NO” means the regulatory link could not be detected by the adopted test.
Fig. 6, Fig. 7, Fig. 8 depict an example of the results obtained by frequency domain GC test for Bcd–Cad, Bcd–Kr and Cad–Kr profile pairs respectively.2 In these figures, the blue line represents the statistic test of each specific frequency, and the red line represents the 5% critical value for all the frequencies. The horizontal axis gives the parameter w to calculate the corresponding frequency f by f = 2π/w. Therefore, when the test statistics is above or very close to the 5% critical value, the causality is detected for that corresponding frequency. As the component of each frequency is considered separately for identifying possible causal link, the impacts of relatively less information are significantly reduced. However, there are some results of filtered series showing very minor differences between the test statistics and the 5% critical value.
Fig. 6
Frequency domain causality test results for Bcd and Cad before and after filtering (time class 11). The blue line represents the statistic test of each specific frequency, and the red line represents the 5% critical value for all the frequencies.
Fig. 7
Frequency domain causality test results for Bcd and Kr before and after filtering (time class 12). The blue line represents the statistic test of each specific frequency, and the red line represents the 5% critical value for all the frequencies.
Fig. 8
Frequency domain causality test results for Cad and Kr before and after filtering (time class 12). The blue line represents the statistic test of each specific frequency, and the red line represents the 5% critical value for all the frequencies.
For CCM test, the optimal embedding dimension E has been selected for each pair of gene expression profiles based on thenearest neighbor forecasting performance by simplex projection. Fig. 9, Fig. 10, Fig. 11 represent the examples of the CCM test result for Bcd–Cad, Bcd–Kr and Cad–Kr before and after filtering the profiles,3 where for example regarding Fig. 9, the red line indicates the reconstruction ability of Bcd cross mapping Cad, while the blue line represents the performance of using historical information of Cad on cross mapping Bcd. In general, the higher ability of factor X on reconstructing the attractor reflects more significant causal effects of the attractor on X. The results of CCM reflect close relationships between Bcd and Cad with and without filtering, while Bcd shows more significant relationship with Kr comparing to Cad for both original and filtered data. The crossmap abilities of Bcd and Cad on Kr are fairly similar, however, Kr clearly indicates higher reconstruction ability on Bcd comparing to Cad. In more details regarding the relationship between Bcd and Cad, considering the average reconstruction ability represented by ρ, it is suggested that CCM is not affected by the smaller length of the series related to the initial time. However, the increasing pattern of the average level of cross-mapping ability up to time class 14(3), which follows by a decreasing trend for the rest of the subclasses, indicates less accuracy of the results for higher time classes. The approximate average value of ρ over 0.5 for noisy series indicates significant cross-mapping (or reconstruction) ability to identify the causal links. Correspondingly, an average is found to be approximately over 0.8, which reflects stronger causal links detected between Bcd and Cad after filtering. Regarding the relationships between Bcd and Kr, both original and filtered series indicate stronger cross-mapping ability from Kr to Bcd, which means that Bcd shows a more powerful regulatory effect on Kr than the other way around. However, this link is slightly more significant in the filtered profiles. In the case of Cad and Kr, the regulatory relationship identified is less significant comparing to the other pairs of genes considered in this study and the average of 0.4 for filtered profiles compared to the average of 0.2 for original series highlights the role of the SSA in improving the achieved results.
Fig. 9
CCM test results for Bcd and Cad before and after filtering (time class 14(8)). The red line indicates the reconstruction ability of Bcd crossmap Cad, while the blue line represents the performance of Cad on crossmapping Bcd.
Fig. 10
CCM test results for Bcd and Kr before and after filtering (time class 14(7)). The red line indicates the reconstruction ability of Bcd crossmap Kr, while the blue line represents the performance of Kr on crossmapping Bcd.
Fig. 11
CCM test results for Cad and Kr before and after filtering (time class 14(5)). The red line indicates the reconstruction ability of Cad crossmap Kr, while the blue line represents the performance of Kr on crossmapping Cad.
It is of note that the overall findings of this research are consistent with the previous efforts in mathematical modeling the segmentation network [55], [56], [57]. For example, Surkova et al. [55] present a successful canalization study of four gap genes hunchback (hb), giant (gt) knirps (kni) kr using the gene circuit method which uses the concentration of bcd, cad tailless (tll) and genes as outside inputs.
Conclusion
Even though the regulatory role of bcd on cad, bcd on kr and cad on kr genes has been previously reported through several genetics experiments, in practice they have not been validated using any causality detection methods. Hence, extracting the regulatory links between these expression profiles was central to this study. We therefore tested various models using the real data to ensure the validity of the findings. We have applied the three causality detection approaches before and after filtering the expression profiles. According to the obtained results the accuracy of data is of critical importance for the success of causality detection studies. Using time domain and frequency domain GC tests, the regulatory link can be detected only after removing the noise from the expression profiles which indicates having an even small amount of error in mean intensities may lead us to obtain a false negative result.It is also imperative to note that for all pairs of genes considered in this study, the time domain GC fails to detect the regulatory link in time classes 10–13. The poor performance of this model here can be attributed to either the length of the data or low expression level for those time classes. The protein molecules synthesised from maternal transcripts just begin to appear from time class 10 and the number of these morphogens, in the areas where they were concentrated, is at a lower amount for time classes 10–13 comparing to the higher time classes.According to the achieved results, confirming that there is a regulatory link between bcd and cad, bcd and kr and also cad and kr, it is worth mentioning that the combined application of our filtering method and the causality methods developed in this work provide means to correct errors and hereby makes it possible to obtain more accurate information from expression profiles. This can be easily adapted to the other pairs of genes and is also applicable to a wider range of GRNs to infer the regulatory interactions presented among the genes of that network.
Authors: Svetlana Surkova; David Kosman; Konstantin Kozlov; Ekaterina Myasnikova; Anastasia A Samsonova; Alexander Spirov; Carlos E Vanario-Alonso; Maria Samsonova; John Reinitz Journal: Dev Biol Date: 2007-11-04 Impact factor: 3.582
Authors: David M Holloway; Lionel G Harrison; David Kosman; Carlos E Vanario-Alonso; Alexander V Spirov Journal: Dev Dyn Date: 2006-11 Impact factor: 3.780
Authors: Jamie M Baird-Titus; Kimber Clark-Baldwin; Vrushank Dave; Carol A Caperelli; Jun Ma; Mark Rance Journal: J Mol Biol Date: 2005-12-22 Impact factor: 5.469
Authors: Svetlana Surkova; Alexander V Spirov; Vitaly V Gursky; Hilde Janssens; Ah-Ram Kim; Ovidiu Radulescu; Carlos E Vanario-Alonso; David H Sharp; Maria Samsonova; John Reinitz Journal: PLoS Biol Date: 2009-03-10 Impact factor: 8.029