Nastaran Allahyari1, Amir Kargaran1, Ali Hosseiny1, G R Jafari1,2. 1. Department of Physics, Shahid Beheshti University, Tehran, Iran. 2. Institute of Information Technology and Data Science, Irkutsk National Research Technical University, Irkutsk, Russia.
Abstract
Despite its high and direct impact on nearly all biological processes, the underlying structure of gene-gene interaction networks is investigated so far according to pair connections. To address this, we explore the gene interaction networks of the yeast Saccharomyces cerevisiae beyond pairwise interaction using the structural balance theory (SBT). Specifically, we ask whether essential and nonessential gene interaction networks are structurally balanced. We study triadic interactions in the weighted signed undirected gene networks and observe that balanced and unbalanced triads are over and underrepresented in both networks, thus beautifully in line with the strong notion of balance. Moreover, we note that the energy distribution of triads is significantly different in both essential and nonessential networks compared to the shuffled networks. Yet, this difference is greater in the essential network regarding the frequency as well as the energy of triads. Additionally, results demonstrate that triads in the essential gene network are more interconnected through sharing common links, while in the nonessential network they tend to be isolated. Last but not least, we investigate the contribution of all-length signed walks and its impact on the degree of balance. Our findings reveal that interestingly when considering longer cycles, not only, both essential and nonessential gene networks are more balanced compared to their corresponding shuffled networks, but also, the nonessential gene network is more balanced compared to the essential network.
Despite its high and direct impact on nearly all biological processes, the underlying structure of gene-gene interaction networks is investigated so far according to pair connections. To address this, we explore the gene interaction networks of the yeast Saccharomyces cerevisiae beyond pairwise interaction using the structural balance theory (SBT). Specifically, we ask whether essential and nonessential gene interaction networks are structurally balanced. We study triadic interactions in the weighted signed undirected gene networks and observe that balanced and unbalanced triads are over and underrepresented in both networks, thus beautifully in line with the strong notion of balance. Moreover, we note that the energy distribution of triads is significantly different in both essential and nonessential networks compared to the shuffled networks. Yet, this difference is greater in the essential network regarding the frequency as well as the energy of triads. Additionally, results demonstrate that triads in the essential gene network are more interconnected through sharing common links, while in the nonessential network they tend to be isolated. Last but not least, we investigate the contribution of all-length signed walks and its impact on the degree of balance. Our findings reveal that interestingly when considering longer cycles, not only, both essential and nonessential gene networks are more balanced compared to their corresponding shuffled networks, but also, the nonessential gene network is more balanced compared to the essential network.
Today, various studies investigate genomic information based on pairwise connections in gene interaction networks [1]. However, the interesting collective behaviors that emerge from these interactions can not be described by simply considering pairs of genes. In other words, while studying pair connections has well broadened our view on the functionality of genes, the higher-order organizations are yet to be explored. To be specific, studies demonstrate that genes are categorized into two main groups [2]. Functionally, essential genes play a more vital role in the biological process, and locally they form a denser network compared to nonessential genes. Yet the crucial question raised here is if there exists a structure beyond these pairwise interactions in these two networks. If so, what is the difference in the underlying structure between essential and nonessential networks? Suppose in a signed interaction network genes A, B, and C are connected, is it logical to consider the interaction AB detached from its context, that is, triad ABC? What is the impact of interactions AC and BC on the interaction between genes A and B? It is known that triadic interactions play a significant role in the construction of real-world networks [3, 4], and structural balance theory (SBT) has well discussed these interactions. In this work, we apply SBT to the gene interaction networks to answer the following questions: Is there a structure beyond pairwise interaction in the gene interaction networks? Which types of triads, balanced or unbalanced, are over (under) represented in these networks compared to the shuffled networks regarding both the frequency and the energy distributions? Is there a difference between essential and nonessential networks in the pattern of connection between triads? In addition, when considering all lengths of cycles, which network is more balanced? And do all genes have an equal impact on the final networks’ degree of balance? These questions are the basis of this study.SBT was introduced in social psychology by Heider to investigate the structure of tension in networks whose mutual relationships are explained in terms of friendship and hostility [5]. Later this theory has been generalized for graphs by Cartwright and Harary through considering the triads as low-dimensional motifs [6]. One of the standard applications provided by balance theory is to measure the degree of balance/ stability in networks [7-12]. On the other hand, quantifying the degree of unbalancing/ frustration in a signed network was proposed as well [13]. Similarly, in biological networks distance to the exact balance is computed [14-17]. Moreover, several researchers have studied the dynamics based on which an unbalanced network achieves balance through reducing unbalanced triads [18-25]. Some studies provide further theoretical expansion of balance theory employing methods from Boltzmann-Gibbs statistical physics to unravel the dynamics behind the structural balance [4, 26, 27]. An appealing application of balance theory recently applied predicts which correlation matrix coefficients are likely to change their signs in the high-dimensional regime [28]. Consequently, there have been two main trends in the literature of SBT: (1) Studying the analytical aspects theoretically [19, 29–35], (2) Applying it to a wide variety of real-signed social, economic, ecologic, and political networks empirically to clarify their structures [36-43]. Among these applications, it should be mentioned that understanding the structure entirely, not partially, calls for considering not only short-range interactions but also longer-range cycles [44-47]. Accordingly, we analyze the structural balance of gene interaction networks. We study the genetic interaction profile similarity matrices of the yeast Saccharomyces cerevisiae [48, 49], which has been categorized into two main classes, namely, essential and nonessential. Among all 5500 genes, approximately 1000 genes are essential because of their vital functional role in biological processes. According to the threshold taken by Costanzo and et al. in [48], essential genes have higher degrees and are considered hubs in the global network. Thus, these genes play a considerable role in the local structure of the network. On top of that, essential genes have higher prediction power compared to nonessential genes [50, 51].Here, we investigate the weighted, signed, and undirected networks of genetic interaction for essential and nonessential genes of the yeast Saccharomyces cerevisiae. Primarily, we are interested in probing the existence of structure beyond the pairwise gene interactions in these networks. To this aim as in our previous study [52], we compare the spectrum of eigenvalues between genetic interaction matrices and their shuffled versions. The rest of the paper is organized as follows. First, we explore the frequency of triads in the gene networks according to the notion of over and under-representation of different types of triad compared to the shuffled networks. Afterward, we assign energy levels to unique configurations of triads and demonstrate triads’ energy distributions. Then, the energy-energy mixing patterns between triads are analyzed to systematically investigate how triads with different energies are connected in the networks. Additionally, we examine the balance of the gene interaction networks by considering all lengths of cycles. Last but not least, we propose a list of genes which have the highest degree of balance.
Materials and methods
Data
Saccharomyces cerevisiae is a beneficial yeast to analyze eukaryotes. One of the outstanding characteristics of it is that almost all bioprocesses in eukaryotes can exist in Saccharomyces cerevisiae [53]. In this study, we analyze the data of the gene interaction similarity networks of it. Costanzo and his colleagues have provided the data [48]. They have published three gene interaction similarity matrices, for essential genes, nonessential genes, and the combination of them in the global form [54, 55]. It can be helpful to perceive these two groups of genes categorized as essential and nonessential more deeply. Here, we explain the discriminator features that classify them into these two groups. First, it should be mentioned that the type of mutation generating these mutants is different. Specifically, essential and nonessential genes are mutated through temperature-sensitive and deletion mutations, respectively. Topologically, they are connected denser compared to the nonessential ones. Thus, the essential genes are considered network hubs. Moreover, in the network, essential genes show a stronger functional connection. Besides, by evaluating the predictive power, essential gene interaction profiles provide higher-accuracy gene function predictions for biological processes. At last, the biological processes specifically detected in the essential gene similarity network are cell polarity, protein degradation, and ribosomal RNA processing. Whereas, in the nonessential gene similarity network, mitochondrial and peroxisomal functions were identified [48].The data file analyzed concerning these genes during the current study is available at http://boonelab.ccbr.utoronto.ca/supplement/costanzo2016/. We have worked with data file S3 titled “Genetic interaction profile similarity matrices”. The steps taken to produce this data are as follows:Based on the growth rate of the colony consisting of two specific mutated genes, the genetic interaction score (epsilon) between them has been obtained.A genetic interaction profile for each gene is constructed by considering the genetic interaction score between that gene and a set of other genes in the colony.The similarity between all two profiles is measured by calculating Pearson correlation coefficient (PCC).The positive value in the PCC matrix indicates how much those two genes are functionally similar to each other, and vice versa. Moreover, zero elements show that those two genes are not related functionally. The aforementioned procedure accomplished to obtain the PCC matrices is presented in Fig 1.
Fig 1
Graphical abstract for the procedure of obtaining the genetic interaction similarity matrices.
For more detail, it should be pointed that the analyzed data is based on a subset of the complete Synthetic Genetic Array analysis dataset (SGA) of the yeast Saccharomyces cerevisiae. The SGA dataset is based on genetic interactions of nonessential deletion mutants and/or essential temperature-sensitive mutants. To derive genetic interactions quantitatively, colony size is modeled as a multiplicative combination of double mutant fitness, time, and experimental factors. Succinctly, for a double mutant, carrying mutations of genes 1 and 2, colony size c12 can be expressed as c12 = f12 × t × s12 × e, where f12 is the double mutant fitness, t is the incubation time, s12 is the combination of all systematic factors, and e is log-normally distributed random noise. The f12 is denoted as f12 = f1
f2 + ε12, where f1 and f2 describe the fitness of the two single mutants, and ε12 (epsilon) is the quantitative measure of the genetic interaction between them. The epsilon is either positive or negative. Negative ε between two mutated genes means that the combination of those two mutants causes cell death. Conversely, positive ε implies that the combination of two mutated genes results in a phenotype less severe than expected. Through the ε, each gene has an interaction profile. In other words, that mutated gene (essential or nonessential) is crossed to a set of another mutated gene (essential or nonessential). Then, PCC between every two interaction profiles of genes is calculated. Indeed, each element of the PCC matrix, which shows the amount of similarity between every two profile interactions of genes, is between −1 and +1 [48].
Network analysis
Prior to our main analysis, as has been carried out in the literature [1] and based on our research aims, we calculated six standard network’s topological and statistical measurements, namely, mean degree (k), the ratio between mean of squared degrees and squared of mean degree , modularity, assortativity coefficient, average path length (L), and clustering coefficient (C). In detail, the most elementary characteristic of a network is its k, which tells us how many links each node has to other nodes on average. Besides, the coefficient 〈k2〉 holds information about the values around mean degree. However, 〈k〉2 includes information about the tail of degree distribution. Hence, low indicates that the tail carries a higher share in the couplings. Regarding modularity, it measures the strength of a network in division into modules. Concerning assortativity (disassortativity), positive (negative) coefficient means that high-degree components often tend to be connected with similar (different) counterparts [56]. Also, L declares the minimum number of edges that must be traversed to get from one node to the other [57]. Finally, the value of C states the extent to which the neighbors of a node are also interconnected [58].To compare networks with different sizes (N) and mean degrees (k) through the N, k-dependent graph measures like C and L, a normalization technique is needed to be applied to correct the effect of N and k. It should be highlighted while each normalization method has its advantages and disadvantages, the network type plays a key role in selecting a suitable method to mitigate the N, k-dependence of graph measures. It is worth mentioning that many empirical networks appear to have small-world characteristics [59]. To investigate if a real-world network is considered as a small-world network or not, small world index (SW) is utilized [59, 60]. The SW is defined as the ratio between normalized C and L. Also, C and L are those of the random network with the same number of nodes and connectivity density. Specifically, small-world networks are characterized by C > C and L ≈ L. Thus, a network can be a small-world network if its SW index is greater than one.Since the values of the aforementioned indicators in a small-world network are between those of a lattice and a random network, one may express the normalized indicators as a fraction of the range of the possible obtainable values. In other words, representing normalized indicator like
as a ratio of the range of possible obtainable values declines the sensitivity to differences in N and k [59]. Through this normalization, C (L) is considered as the observed indicator, C (L) is the value of that in the corresponding random network, and C (L) shows the value of that indicator in the lattice. Specifically, the random network is constructed by shuffling the links without any changes in the number of nodes or connectivity density. Also, the ring lattice is created by the same number of nodes with the k for each node while preserving the edge densities.Finally, it should be noted that comparing the small-worldness of two networks with different N and k leads to misleading results. On the one hand, the value of L in small-world networks is close to that of random networks. On the other hand, the value of C is contrastingly close to that of lattice networks. Thus, normalization implies a bias, i.e. the normalized SW is larger than its non-normalized one. Because of this, the SW is also significantly affected by N and k. Altogether, quantifying the extent to which networks display a small-world structure is a standard way to compare their small-worldness. To this aim, as Muldoon has proposed [61] the small-world propensity (ϕ) is calculated to reflect the deviation of a network’s C and L, from both lattice and random networks constructed with the same N and k. In the following equation, Δ and Δ show the deviation of C and L, that are calculated as and , respectively. The value of ϕ which is between zero and one, is close to one for networks with high small-world characteristics, while the lower value of ϕ represents less small-world structure.Besides graph measures, further investigations regarding the existence of structure based on spectral analysis can sure be insightful. When there is no structure beyond pairwise interactions, that network can be known as a random one. In a random network, the distribution of the spectrum of eigenvalues has a semi-circular form with a body-centered around zero [62]. In a nonrandom network, there are some eigenvalues out of the bulk [63]. Also, one large eigenvalue exists that mostly has a value far from the bulk of the eigenvalues [64, 65]. This eigenvalue plays a significant role and addresses the global trend of the system.
Structural balance theory
To go beyond the assumption that pair interactions are independent and look for triads as the shortest motif, structural balance theory (SBT) is applied [29]. To consider the local triads, we focus on groups with three interacting nodes in the network. There are four types of triads, including two balanced and two unbalanced ones. The idea of “The friend of my friend is also my friend [+ + +] refers to strongly balanced triad (T3)”. Also, the idea of “The enemy of my enemy is my friend [− − +] points to weakly balanced triad (T1)”. Regarding the two other types of signed triads, [+ + −] is strongly unbalanced triad (T2), and [− − −] is weakly unbalanced triad (T0), which give rise to frustration in the network [44]. In other words, the triad is recognized as a balanced one if the sign of the product of its links is positive; otherwise, the triad is considered as an unbalanced or frustrated one.As counting the number of balanced and unbalanced triads prepares informative information, significant computational methods are applied to speed up accounting for the number of triads in signed and large networks [66]. Here, we mention one of them which works based on unsigned (A(|Σ|)) and signed (A(Σ)) adjacency matrices. In the unsigned adjacency matrix, if the nodes i and j are connected then A(|Σ|)(i, j) = 1, otherwise A(|Σ|)(i, j) = 0. In the signed adjacency matrix, if the link’s sign connecting those nodes is positive then A(Σ)(i, j) = 1, and if the link’s sign connecting those nodes is negative then A(Σ)(i, j) = −1. As follows, the two equations count the number of balanced (b) and unbalanced (u) triads, respectively:As Leskovec has proposed [3], we have created a null model to compare the empirical frequencies of triads. It is important for generating a null model to keep the exact fraction of positive (negative) signs. Specifically, each randomly chosen link connecting the two existing nodes is shuffled. Thus, the created null model represents no organization in the structure. Then, the fraction of each type of triad in the shuffled network (p0(T)) is calculated. The triad i is overrepresented if the related fraction in the original network (p(T)) be more than that of in the shuffled one; otherwise, it is underrepresented. Next, the value of surprise (s(T)) is calculated which is the number of standard deviations by which the actual number of triad i differs from its expected number under the null model. Within the function of (s(T)), T is the number of triad i, E[T] is the expected number of triad i calculated as E[T] = Δp0(T), and Δ is the total number of triads calculated as Δ = trace(A(|Σ|)3. To eliminate the effect of size in both networks, after calculating the s(T) function, it is divided into .It has been stated that a balanced network is a network consisting of all positive triads [8]. While the possibility of possessing a real-world network containing all positive signed triads (positive product of their sides) is close to zero. Thus, a common approach is to measure the degree of balance of a signed network. To this aim, the concept of balance enables us to determine an energy landscape for such networks. Energy describes how much a network is structurally balanced [21, 67]. In a weighted network, the network energy (E) is obtained by the negative summation of the products of the triads’ links (w
w
w) divided by weighted sum of all triads’ energies (Δ) which is calculated as . For a balanced triad, the product of its weighted links is a positive number, whereas for an unbalanced one this product is negative. If E = −1, then we have a fully balanced network. But if it equals +1, then we have an unbalanced network. Consequently, in real-world networks, the energy of triads is between −1 and +1. According to SBT’s suggestion, a network evolves towards the minimum level of tension [67].The energy landscape introduced above considers the triads individually and does not designate how they are organized in the network globally. Put differently, after calculating the energy of each triad, we aim to investigate how they are connected through one shared link. The following questions are our concerns in this regard: Do triads form a module, or are isolated? Does a triad with a high (low) energy value tend to be connected with triads of different energies? What types of triads a specific triad with a defined energy value is connected to, and with what energy value? To answer these questions, the energy-energy mixing pattern is plotted. To be more specific, through moving on sorted spectrums of energy of two specific types of triads, the number of triads that have a common link is counted. This calculation is repeated for all pair types of triads. Indeed, this pattern shows if particular types of triads are packed together and form a kind of module. Also, it figures out if triads represent a heterogeneous (homogeneous) form of connections. Moreover, it clarifies if a triad with a high (low) energy value tends to be connected with triads of different (similar) energies.
Walk-based measure of balance and detecting lack of balance
SBT gives specific information to understand the structural balance of signed networks but is biased. Through triads, our analysis recognizes the frustration on the shortest possible cycle, but it overlooks to consider the unbalancing correlated with longer-range cycles [33]. To extend our analysis by considering cycles with all possible lengths, it should be mentioned that the balance or unbalancing of each cycle is related to the multiplication of the signs of its links. If the sign of the product is positive, or the number of negative links in the cycle is even, it is a balanced cycle. Therefore, if all cycles in a network have a positive sign, we can consider the signed network as a balanced one [44-46]. But the fact is that the probability of having a real-world network containing all cycles with a positive sign is close to zero. As Estrada proposed in [47], the walk-balance index (K) is used to quantify how close to balance an unbalanced network is. Specifically, walks with all lengths are considered concerning assigning more weights to the shorter ones, which is logical [47]. This method relates a hypothetical equilibrium between the real-world signed network and its underlying unsigned version. In K, A(Σ) and A(|Σ|) are signed and unsigned adjacency matrices, respectively. Elements in A(Σ) are + 1 when the interaction matrix values are more than zero. Also, if the interaction matrix values are less than zero the elements in A(Σ) are −1. In the unsigned adjacency matrix A(|Σ|), if the elements in the interaction matrix are nonzero, the elements of A(|Σ|) are 1. Another index proposed by Estrada measures the extent of the lack of balance in the network (U), as follows [47]:The value of K as the density of the balanced walks with all lengths in the network is between zero and one. To be specific, when the expansion of the exp(A(Σ)) in K is opened, among walks with all possible lengths, there can be some negative terms in the nominator, although, in the denominator, all terms of expansion of the exp(A(|Σ|)) are positive. Thus, if all present walks are positive (a balanced network), then this index calculating the amount of balance of the network meets its maximum value, which is one. Additionally, U calculating the amount of unbalance would have its minimum value, which is zero. At last, the participation of each node in the balance of the network can be calculated by the degree of balance of a given node i as K [47]. According to the following equation, K flows between zero and one. Thus, the term “highest degree balance” is assigned to the nodes with K = 1 that participate only in the walks with an even number of negative links. That is, all walks they are joining in are balanced.
Results
Based on our main research questions, six standard, and informative network’s indicators, i.e., mean degree (k), the ratio between mean of squared degrees and squared of mean degree , modularity, assortativity coefficient, average path length (L), and clustering coefficient (C) are calculated. Specifically, through computing C we observe networks’ tendency to form triads, which are the basic building blocks in the balance theory framework. As well, modularity provides information on the networks’ communities, which is a very crucial feature in gene network studies. Also, if k of networks, besides their sizes (N), be different, to compare those networks, a normalization technique which is related to the topology of networks should be selected to normalize N, k-dependent network’s indicators like C and L. Since most real-world networks have small-world topology, according to Eq (1), the small-world index (SW) in our networks is calculated. The result indicates that the SW in both essential and nonessential gene networks as the same as in small-world networks is greater than one, which is 1.0866 and 1.2402, respectively. Thus, according to what small-world structure implies (the values of C and L are between their values of lattice and random versions), C and L through Eq (2) within a range of possible values are normalized.Indicators in both essential and nonessential gene networks are compared in Fig 2. Despite the segregation among the measurements, there exist some similarities. As shown in Fig 2, the k in the nonessential gene network is higher compared to the essential network. Besides, in both networks, the ratio between mean squared degrees and squared of mean degree is close to one. This implies that neither nodes with high degrees nor those with a medium degree are significantly dominant over the other one. In addition, the value of the modularity in the essential network is more than that of in the nonessential network. The higher value of this indicator in the essential gene network than the nonessential one indicates the higher tendency to be clustered into multiple sets of strongly interacting parts. Moreover, as it has been illustrated in Table 1, the assortativity coefficient in both networks is negative but so close to zero, i.e., both networks show weak disassortative behavior. However, the magnitude of disassortativity is one order higher in the essential network. In the radar plot (Fig 2), the absolute values of assortativity coefficients are demonstrated. Additionally, the values of normalized L, in both networks are close to zero, which shows that these networks are densely connected, and there is a very small difference between values of observed L with those of shuffled versions. As well, the tendency in forming clusters is defined by the normalized C which is higher in the essential network. At last, because of the size dependence of SW, through Eq (3) the small-worldness propensity (ϕ) of networks is calculated to understand the extent of this characteristic in our networks. The value of ϕ, for the essential gene network, is larger compared to the nonessential network.
Fig 2
The radar plot shows six standard network’s indicators in both essential and nonessential gene networks.
Mean degree, the ratio between mean of squared degrees and squared of mean degree, modularity, assortativity coefficient, normalized clustering coefficient, and small-world propensity. The radar plot for the essential gene network is plotted in blue and for the nonessential gene network in yellow.
Table 1
Network’s indicators.
Essential
Nonessential
Mean degree (k)
478.890
718.957
〈k2〉〈k〉2
1.0278
1.0560
Modularity
0.033
0.018
Assortativity
−0.017
−0.002
Normalizedclustering.coef(C)∼
0.139
0.067
Small world propensity (ϕ)
0.391
0.340
Mean degree, the ratio between mean of squared degrees and squared of mean degree, modularity, assortativity coefficient, normalized clustering coefficient, and small-world propensity for both essential and nonessential gene networks.
The radar plot shows six standard network’s indicators in both essential and nonessential gene networks.
Mean degree, the ratio between mean of squared degrees and squared of mean degree, modularity, assortativity coefficient, normalized clustering coefficient, and small-world propensity. The radar plot for the essential gene network is plotted in blue and for the nonessential gene network in yellow.Mean degree, the ratio between mean of squared degrees and squared of mean degree, modularity, assortativity coefficient, normalized clustering coefficient, and small-world propensity for both essential and nonessential gene networks.Then, we have investigated the existence of clusters in the construction of the essential and nonessential gene networks. Within groups, genes cooperate to annotate a common bioprocess efficiently. Clusters in both essential and nonessential gene networks are illustrated through cluster maps (Fig 3). It can be seen that the essential network has stronger structural modules which are in line with the previous result which stated that the essential network is more modular than the nonessential network. In other words, although the clusters exist in both networks, the structure in the essential gene network (Fig 3A) is highly stronger than the nonessential network (Fig 3B). This also confirmes our previous study, where we observed a significant difference between the distributions of eigenvalues in the original matrices and those of the shuffled networks [52]. To be specific, some of the eigenvalues in the original networks are not limited to the narrow bulk of the eigenvalues in the shuffled matrices. Thus, it can be confidently concluded that the structure of the gene interaction networks is far from random.
Fig 3
The cluster map of two essential and nonessential gene networks.
A: Cluster map of essential gene network, B: Cluster map of nonessential gene network.
The cluster map of two essential and nonessential gene networks.
A: Cluster map of essential gene network, B: Cluster map of nonessential gene network.After studying the clusters, the structural balance in gene interaction networks to study the structure beyond pairwise interactions is analyzed. To this aim, as the first step, the size, the percentage of positive and negative links, and the total number of triads in both networks are prepared (Table 2). In the following, the two equations Eqs (4) and (5) are utilized to count balanced (b) and unbalanced (u) triads. Then, to compare the dominance of balanced or unbalanced triads in our networks, we have applied the method proposed by Leskovec et al. [3]. According to this method, if the fraction of balanced (unbalanced) triads in the original network is higher than the shuffled one, it will overrepresent, and vise versa. Through this method, the fraction of the triad T in the original network is considered as p(T) and in the shuffled network as p0(T). Moreover, they have proposed the concept of surprise as Eq (6), s(T), to understand how significant these over (under) representations are. Due to the size of the networks, s(T) has a significant order of tens. The results indicate that balanced triads are overrepresented in both essential and nonessential gene interaction networks. On the contrary, unbalanced triads are underrepresented compared to their shuffled versions. These results are presented in Table 3.
Table 2
Dataset statistics.
Essential
Nonessential
Nodes
1, 040
4, 430
Edges
249, 023
1, 592, 490
+Edges
50.1%
63.5%
−Edges
49.9%
36.4%
Edges(N2)
0.461
0.162
Triads
20, 310, 741
81, 470, 554
Triads(N3)
0.109
0.006
Number of nodes, edges, triads in both essential and nonessential gene networks with threshold w < |0.05|.
Table 3
Number and probability of balanced and unbalanced triads in the original networks compared to the null model.
Essential gene network
|Ti|
p(Ti)
p0(T)
s(Ti)
s(Ti)Δ
Stronglybalanced (T3)
3, 670, 948
0.180
0.124
764.0
0.2
Weaklybalanced (T1)
10, 362, 180
0.510
0.375
1, 255.1
0.3
Stronglyunbalanced (T2)
4, 421, 666
0.217
0.374
−1, 461.1
−0.3
Weaklyunbalanced (T0)
1, 855, 947
0.091
0.125
−462.0
−0.1
Nonessential gene network
|Ti|
p(Ti)
p0(T)
s(Ti)
s(Ti)Δ
Stronglybalanced (T3)
30, 868, 604
0.378
0.256
2, 531.1
0.3
Weaklybalanced (T1)
32, 704, 022
0.401
0.253
3, 071.6
0.3
Stronglyunbalanced (T2)
16, 028, 365
0.196
0.441
−4, 452.8
−0.5
Weaklyunbalanced (T0)
1, 869, 563
0.022
0.048
−1, 071.7
−0.1
|T| = the total number of triads of type i; p(T) = the fraction of T; p0(T) = the fraction of T in the null model; s(T) = the amount of surprise, i.e., the number of standard deviations by which the actual number of T differs from its expected number under the null model; and Δ = the total number of triads.
Number of nodes, edges, triads in both essential and nonessential gene networks with threshold w < |0.05|.|T| = the total number of triads of type i; p(T) = the fraction of T; p0(T) = the fraction of T in the null model; s(T) = the amount of surprise, i.e., the number of standard deviations by which the actual number of T differs from its expected number under the null model; and Δ = the total number of triads.After analyzing the frequency of triads, we have examined the energy distribution of different types of triads. Thus, we have calculated the energy of triads by Eq (7). Then, the energy distributions of strongly balanced triads (T3) in Fig 4A, weakly balanced triads (T1) in Fig 4B, strongly unbalanced triads (T2) in Fig 4C, and weakly unbalanced triads (T0) in Fig 4D for both original networks, in comparison with their shuffled versions, are presented. Results indicate: (1) All types of triads, in both essential and nonessential networks, have many triads with small values of energies. (2) In the essential gene network, the largest amount of triads’ energy is for the T1 triads, and in the nonessential gene network, the T3 triads have the largest value of energy (Fig 4E). (3) In both gene networks, the bar levels of the average energy of balanced triads are higher than those of shuffled ones. However, on the contrary, the bar levels of the average energy of unbalanced triads are lower than those of shuffled ones. (4) As Fig 4F, in the essential gene network, the relative frequency of the balanced triad T1 is individually equal to the relative frequency of the other three types of triads.
Fig 4
Energy distributions, average energy, relative frequency for all four types of triads.
A: Energy distribution for strongly balanced triads, B: Energy distribution for weakly balanced triads, C: Energy distribution for strongly unbalanced triads, D: Energy distribution for weakly unbalanced triads. (The energy distribution of triads for original essential gene network and its shuffled network are plotted in blue and red, respectively. The energy distribution of triads for original nonessential gene network and its shuffled network are plotted in yellow and gray, respectively). E: From left to right, the average energy for essential gene network and nonessential gene network. F: From left to right, the relative frequency for essential gene network and nonessential gene network (Green bars for original networks and purple ones for shuffled networks).
Energy distributions, average energy, relative frequency for all four types of triads.
A: Energy distribution for strongly balanced triads, B: Energy distribution for weakly balanced triads, C: Energy distribution for strongly unbalanced triads, D: Energy distribution for weakly unbalanced triads. (The energy distribution of triads for original essential gene network and its shuffled network are plotted in blue and red, respectively. The energy distribution of triads for original nonessential gene network and its shuffled network are plotted in yellow and gray, respectively). E: From left to right, the average energy for essential gene network and nonessential gene network. F: From left to right, the relative frequency for essential gene network and nonessential gene network (Green bars for original networks and purple ones for shuffled networks).Here, we intend to understand how triads are globally organized in the network. To address this aim, the energy-energy mixing pattern in the logarithmic scale has been plotted in Fig 5. By using the logarithmic scale, there is a magnification between the elements with small amounts. Specifically, our goal is to enrich our analysis by studying patterns of the connection between triads. Results reveal that there are fewer connected triads compared to isolated ones overall. Moreover, T1 triads are more connected to each other compared to other types. Furthermore, triads with low absolute energy values have more tendency to be connected compared to high energy triads. While this pattern holds for both essential and nonessential gene networks, the essential network has more triads with the shared link. To clarify Fig 5A more clearly, the following steps are taken to plot each square in Fig 5B:
Fig 5
The pattern of connection between triads through one shared link in Log scale.
A: All types of pair connected triads (From left to right, essential gene network and nonessential gene network), B: An overview of creating connection between triads for one square (T3 [+ + +] and T2 [+ - +] triads).
The spectrums of energy of two specific types of triads are sorted.Through moving on the energy axes, the number of triads that have a common link is counted and saved in a matrix in the Log scale.The previous steps are repeated for all pair types of triads.All 16 squares in 4 rows and 4 columns are merged.
The pattern of connection between triads through one shared link in Log scale.
A: All types of pair connected triads (From left to right, essential gene network and nonessential gene network), B: An overview of creating connection between triads for one square (T3 [+ + +] and T2 [+ - +] triads).Now, by considering walks with all possible lengths, we extend our analysis. To this aim, the quantity of balance or unbalancing through these walks is measured. Indeed, we employed two indices introduced in [47] by Estrada not to limit ourselves only to triads as the shortest cycle. One of the two indices is the walk-balance index (K) which calculates by Eq (8) the amount of how close to balance an unbalanced network is. Another index represents the extent of the shortage of balance (U) in a given signed interaction network by Eq (9). In Table 4, the values of the K in both essential and nonessential gene networks have been presented. For each index in both networks, there is a leading difference between the value of the original and that of the shuffled matrix. The fact is that in an unbalanced network, for example, a random network that holds no structural information, the K would have the lowest possible value i.e., a value close to zero. Also, U would have the highest possible value, that is, a value close to one. As the result indicates, by considering all walks, both essential and nonessential gene networks are more close to balance rather than their corresponding shuffled versions. Besides, in the nonessential gene network, K is higher than the essential gene network. Moreover, the U in the essential gene network is much more than the nonessential gene network. Furthermore, there is an index that characterizes the degree of balance for a given node i as (K) by Eq (10). In supplementary, a table is prepared to represent the classification of genes with the highest degree of balance (K = 1) in terms of biological processes they annotate.
Table 4
Walk-balance index for all cycles (K), percentage of the lack of balance (U).
Essential
Nonessential
Koriginal network
0.195
0.988
Kshuffled
0.000
0.131
Uoriginal network(%)
67.238
0.575
Ushuffled(%)
99.999
76.749
For the original and shuffled of essential and nonessential gene networks with threshold w < |0.2|.
For the original and shuffled of essential and nonessential gene networks with threshold w < |0.2|.
Discussion
We analyzed gene interactions in the weighted, undirected, and signed networks of yeast Saccharomyces cerevisiae. The pre-processed dataset used includes two matrices, namely, essential and nonessential gene interaction networks. Here, we explored these two gene networks beyond pairwise interactions in the context of structural balance theory (SBT). The following results have been concluded accordingly: We have discovered that in both essential and nonessential gene networks balanced triads are overrepresented while unbalanced triads are underrepresented. Interestingly, this finding is in agreement with Heider’s balance theory. To be specific, our results empirically support the strong notion of structural balance theory (Table 2). This is while in some social networks, the weak formulation of structural balance has been reported as well.Additionally, we have observed T1 and T0 triads in both gene networks with more average energy and higher relative frequency in the essential network. This can be interpreted from the perspective of SBT in which the presence of T1 and T0 triads in the organization of a network is related to having a higher degree of modularity. In other words, to have T1 or T0 triads in the stable state of a network indicates that densely connected modules are also connected to others through negative links. This result corresponds to the presence of specialized clusters in the gene interaction network which has also been reflected in the energy-energy mixing pattern between the triads with one common link. It is worth mentioning that this pattern is more significant in the essential network as genes in this network are more densely interconnected.Moreover, we have noted that although energies of the essential and nonessential networks are not significantly different from each other, the underlying triads’ distributions that led to these final energies are not similar. As mentioned earlier, the average energy and the relative frequency of unbalanced triads T0 are higher in the essential gene network compared to the nonessential network. Thus, they are more likely to experience different possible states. Therefore, it can be concluded that unbalanced triads T0 are providing the essential gene networks with the necessary structure that is needed to contain dynamism which is crucial for vital biological mechanisms. This is while for nonessential genes with less unbalanced triads T0, the likelihood of being trapped in a local minimum is higher.Finally, to extend our analysis we have calculated two indices by considering the walks with all possible lengths. Namely, the quantification of how close to balance an unbalanced network is, and the extent to which a given signed network lacks balance by considering longer-range cycles. Results surprisingly suggest that when all length walks are taken into account, both essential and nonessential gene networks are more balanced than expected from a random allocation of the signs to the links. In other words, both essential and nonessential gene networks, besides balanced triads, respect balanced long-range interactions. Moreover, the nonessential gene network is more balanced and stable than the essential network. As mentioned earlier, the combination of both essential and nonessential interactions constructs the global gene network as a whole. For this network, we have proposed a list of genes in terms of biological processes they annotate in the S1 File that have the highest degree of balance. Thus, our finding highlights the genes that are structural of note, regarding which further biological analysis seems to be very much valuable.(PDF)Click here for additional data file.16 Nov 2021
PONE-D-21-31371
The structure balance of gene-gene networks beyond pairwise interactions
PLOS ONE
Dear Dr. Jafari,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses all the points raised by the reviewers. Most of them simply require better explanations, but anyway they are important for the understanding of your work.
Please submit your revised manuscript by Dec 31 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Sergio GómezAcademic EditorPLOS ONEJournal Requirements:When submitting your revision, we need you to address these additional requirements.1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to Questions
Comments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: YesReviewer #2: Partly********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/AReviewer #2: Yes********** 3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: YesReviewer #2: Yes********** 4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: YesReviewer #2: Yes********** 5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors analyse the structure of a gene interaction networks of the yeast Saccharomyces cerevisiae through the lens of structural balance theory. In this sense, they provide evidence about the structural differences between the essential network and the nonessential one and, for each of them, the difference with respect to their shuffled counterpart. In this way, they highlight the strongly non random nature of the gene networks –as quantified through several measures of balance–, while reporting some patterns that may be of potential interest for further research.There is only one, major problem in the present analysis for which I could not find a way out. It regards the way in which the distributions of energy for the four type of triads are constructed (figure 4). Indeed, to each type of triad is assigned a certain energy, given by the product of the signs of the three links forming the triad, multiplied by -1. Accordingly, the energy is -1 for the balanced triads (T_1 and T_3) and 1 for the unbalanced ones (T_0 and T_2). (Then, as stated by Eq.4, the normalised sum over all the triads provides the energy of the entire network.) Nonetheless, in figure 4 (and 5, consequently), those energies take values on a continuous range, whereas only the discrete values of -1 and 1 should be allowed. Given that, it is really unclear from where those distributions come from. As a consequence, the same holds for the energy-energy correlations reported in figure 5. Given the important weight that such findings have on their work overall, the authors should made this point very clear.The manuscript is well written. I can recommend it for publication once the authors will have addressed that pinpointed issue.Please, also note the typos listed below:- remove question mark in line 11- change "Leskovek" to "Leskovec" in line 132Reviewer #2: The paper computes balance for (1) triangles (3-cycles) and (2) all-length walks in gene-gene interaction networks for essential and nonessential genes for the yeast Saccharomyces cerevisiae. Overall, the paper finds that more balance is present in the nonessential genes network than the essential genes in terms of both triad- and all-walks balance. The paper is well-structured and the balance analyses are thorough.1. Operationalization: It is still unclear what energy-energy mixing means here, specifically in terms of defining what constitute a 'positive' and 'negative' link between two genes. I suggest the authors to set a stronger background that explains how they attribute signs to edges.2. Operationalization: It would be useful to explain further what constitute an essential vs. nonessential gene; would be helpful to readers from other disciplines.3. Operationalization: It is still unclear how you define "high impact" of significant genes (page 8)? Do you consider genes that sit on many 'balanced' walks, or are there any measures you used to operationalize 'impact'?4. Findings: You reported mean degree, the ratio between mean of squared degrees and squared of mean degree, modularity, assortativity coefficient, average path length, and clustering coefficient as indicators of a network's topology. They are useful but not the only indicators of topology; explanations to why these particular measurements are used would be useful.5. Findings: In table 4, there's an index value of 0 for genes-K shuffled cell. Can you explain the meaning for index of 0 here?6. Findings: Given the unequal sizes of the genes vs. nonessential genes network, with the nonessential genes network being much bigger, that could explain the higher 'likelihood' for finding balanced configurations for this particular network. Do you consider controlling for the size of the networks, and/or normalization techniques to mitigate the sampling/size difference here?********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: NoReviewer #2: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.22 Dec 2021Respected Editor,The authors would like to express their gratitude to you and the referees for the careful and thorough reading of our manuscript titled "The structure balance of gene-gene networks beyond pairwise interactions" and for providing fruitful comments. We found all comments very crucial, and we have modified our manuscript accordingly as follows:1) The following sections have been modified according to the PLOS ONE style templates: (A) Affiliations and the corresponding authorship, (B) The order of sections and subsections so that the title of paragraphs in the method section is replaced with subsections, (C) All figures and tables are modified.2) The first paragraph of the "data" subsection of the method section is enriched to describe the essential and nonessential genes in more detail. Besides, a new paragraph is added to have a stronger background in the data preparation steps.3) In the "structural balance theory" subsection of the method section, Eq (7) and two sentences within its previous paragraph are modified by replacing S with W to prevent possible misunderstanding of using signs instead of weights. Also, ∆ is replaced with ∆ w to apply the weighted sum of all triads’ energies instead of the binary summation. Hence, Fig 4E is modified. Moreover, to have better consistency in terms of notations, we have updated the terms "adjacency matrix, A" and "connectivity matrix, G" in Eq (4) and (5) to "signed adjacency, A(Σ)" and "unsigned adjacency A(|Σ|)" according to Eq (8), respectively.4) To the last paragraph of the "walk-based measure of balance and detecting lack of balance" subsection of the method section, a few sentences are added to better clarify the notion of walk balance index (K) and degree of balance index of a given node (K i ), respectively.5) Three new paragraphs to the "network analysis" subsection of the method section are added to explain how we are allowed based on the network topology to compare size-dependent indicators of networks. To cite the applied method, we have also added three new references as numbers 59-61 to the "network analysis" subsection. Besides, two paragraphs at the beginning of the result section, Table 2, and Fig 2 are modified.6) In the result section, a new subfigure is added as Fig 5B, along with new explanations to better clarify Fig 5A.7) The supplementary file is updated by classifying the reported genes with the highest degree of balance in terms of the biological processes that are annotated.Please find our attached point-by-point response to the reviewers concerns. All changes are marked red in the revised manuscript with track changes, and pages numbers that are provided in this response are according to this file as well. We highly appreciate your valuable time in considering the revised version of our manuscript.Sincerely,G. R. JafariFull professorDepartment of Physics, Shahid Beheshti University, Evin, Tehran, IranInstitute of Information Technology and Data Science, Irkutsk National Research Technical University, Lermontova, Irkutsk,RussiaEmail: g_jafari@sbu.ac.ir, gjafari@gmail.com#Reviewer 11. There is only one, a major problem in the present analysis for which I could not find a way out. It regards the way in which the distributions of energy for the four types of triads are constructed (figure 4). Indeed, each type of triad is assigned certain energy, given by the product of the signs of the three links forming the triad, multiplied by -1. Accordingly, the energy is -1 for the balanced triads (T 1 and T 3 ) and 1 for the unbalanced ones (T 0 and T 2 ). (Then, as stated by Eq.4, the normalized sum over all the triads provides the energy of the entire network.) Nonetheless, infigure 4 (and 5, consequently), those energies take values on a continuous range, whereas only the discrete values of -1 and 1 should be allowed. Given that, it is really unclear where those distributions come from. As a consequence, the same holds for the energy-energy correlations reported in figure 5. Given the important weight that such findings haveon their work overall, the authors should make this point very clear.Response: Thanks to the respected reviewer for this fruitful comment. It should be mentioned that in Eq (4), the energy of the network is computed by the product of −1 into the “weights” of the three links forming the triad (not just the signs), divided by the weighted sum of all triads’ energies (∆ w ) in the network. That is, “w i j , w jk , w ki ” as the links’ weights of the “i jk” triad are multiplying to give its energy. To correct this in the manuscript, we replaced S in Eq (1) with W according to Eq (2) to avoid misunderstanding the sign instead of weight. Besides, we replaced ∆ with ∆ w to apply the weighted sum of all triads’ energies instead of the binary summation, as follows:E = − Σ s ij s jk s ki / ∆ (s = ±1) (1)toE = − Σ w ij w jk w ki / ∆ w (−1 ≤ w ≤ +1). (2)Therefore, the distribution of energy for balanced triads T 3 and T 1 (Fig 4A and B), as well as for unbalanced triads T 2 and T 0 (Fig 4C and D) is calculated by multiplying −1 to the weight of links of each three interconnected genes to each other. Consequently, in Fig 5, energies can take values on a continuous range between −1 and +1. It is worth mentioning, in the standard balance theory, links are unweighted and characterized only by positive and negative signs. This is while in real- world networks, weights of links are as crucial as signs 1 . A part of the matrix we worked on is here. As can be seen, it is a weighted and signed network with links weights between −0.4 to +0.8 (Fig 1).Figure 1. A part of the studied matrix as an example for further clarification 2 .To make this indispensable point clearer in the manuscript, we add two sentences for a more precise explanation in the method section, on page 6, lines 229-234. Furthermore, Eq (4) (in the revised manuscript Eq (7)) is corrected by replacing S with W, and by using the weighted sum of all triads’ energies (∆ w ) instead of binary summation (∆). Hence, we modified Fig 4E. We would like to greatly thank the reviewer for the careful, and insightful review and thoughtful comment.2. Please, also note the typos listed below:-remove question mark in line 11-change "Leskovek" to "Leskovec" in line 132Response: Thanks to the reviewer for this comment. Both typos are corrected in the revised manuscript.#Reviewer 21. Operationalization: It is still unclear what energy-energy mixing means here, specifically in terms of defining what constitutes a ’positive’ and ’negative’ link between two genes. I suggest the authors set a stronger background that explains how they attribute signs to edges.Response: Thanks to the respected reviewer for this fruitful comment. Besides calculating triads’ energies (Fig 4), we aimed to understand how they are organized globally in the network by using energy-energy mixing analysis. Specifically, in the energy-energy mixing (Fig 5), our goal is to enrich our analysis through studying nontrivial patterns according to which triads are connected. In other words, after considering each triad individually, we intended to extract information about the connections between them. To be more specific, the following questions were our concerns in this regard: Do triads form a module, or are isolated? What types of triads a specific triad with a defined energy value is connected to, and with what energy value? Does a triad with a high (low) energy value tend to be connected with triads of different energies? Results reveal that there are fewer connected triads compared to isolated ones overall. Moreover, T 1 triads are more connected to each other compared to other types. Furthermore, triads with low energy values have more tendency to be connected compared to high energy triads. While this pattern holds for both essential and nonessential gene networks, the essential network has more triads with the shared link. Besides modified explanations about the meaning of the energy-energy mixing pattern in the last paragraph of the “structural balance theory” subsection of the method section on page 7, in the result section on pages 10 and 11, the following steps that is taken to plot Fig 2 are added for further clarification. Moreover, to clarify Fig 5A more clearly, we added Fig 5B to it so that it can be easier for the readers to comprehend how each of the squares has been made. For each square:1. The spectrums of energy of two specific types of triads are sorted.2. Through moving on the energy axes, the number of triads that have a common link is counted and saved in a matrix in the Log scale.3. The previous steps are repeated for all pair types of triads.4. All 16 squares in 4 rows and 4 columns are merged.Figure 2. An overview of creating connection between triads for one square (T 3 [+ + +] and T 2 [+ - +] triads).About the signs of edges, as mentioned in the “data” subsection of the method section, the signed and weighted matrices studied here are presented publicly by Costanzo and his colleagues 2 . To better clarify how these signs are computed, a whole new paragraph to the “data” subsection of the method section, as the last paragraph on pages 3 and 4, is added describing the method by which signs are assigned to links by Costanzo et al 3 . Briefly speaking, when two genes are mutated, in terms of the size of the colony including them, the genetic interaction score (epsilon) between them is obtained. Thus, each gene has an interaction profile with other genes. By calculating the Pearson Correlation Coefficient (PCC) of these interaction profiles,the genetic interaction similarity matrices have been provided.2. Operationalization: It would be useful to explain further what constitutes an essential vs. nonessential gene; would be helpful to readers from other disciplines.Response: We would like to thank the reviewer for the careful comment. We are committed to the definitions of essential and nonessential genes that Costanzo and et al. have proposed in their impactful study 3 . The features that specify whether a gene is essential or nonessential are the type of mutation, density (sparsity) of its corresponding network, the strength of interactions, the power of prediction in gene function, the biological process annotation. Thus, further explanation of what constitutes an essential vs. nonessential gene is added as an enrichment to the first paragraph of the “data” subsection of the method section, on page 3.3. Operationalization: It is still unclear how you define the "high impact" of significant genes (page 8)? Do you consider genes that sit on many ’balanced’ walks, or are there any measures you used to operationalize ’impact’?Response: We appreciate the reviewer for the helpful comment. As Estrada 4 has proposed, according to Eq (3), the de- gree of balance of each given node (K i ) is calculated. In our study, we used the characteristic K i to find genes with the maximum value of K i in the network. These genes participate in the walks with only an even number of negative links; that is, those genes only participate in balanced walks. Since the value of K i is between zero and one; thus, we report the genes with K i = 1. As the term high impact may imply a sense of misunderstanding, we modified it in the manuscript on page 3, line 68. Besides, within the last paragraph of the method section, lines 284-287, on pages 7 and 8, we explained how the characteristic K i defines the degree of balance for a given node. Also, on page 8, as the respected reviewer has mentioned, we modified lines 274-275 (in the revised manuscript on page 11, lines 407-408).Briefly, we changed the "high impact" term to "highest degree of balance" for each given gene with K i =1.K i = exp A(Σ) / exp A(|Σ|) (3)Last but not least, to contain more informative information, we edited the supplementary file and reported the genes with the highest degree of balance classified in terms of biological processes annotating.4. Findings: You reported mean degree, the ratio between mean of squared degrees and squared of mean degree, modularity, assortativity coefficient, average path length, and clustering coefficient as indicators of a network’s topology. They are useful but not the only indicators of topology; explanations to why these particular measurements are used would be useful.Response: Thanks to the reviewer for this insightful comment. As the respected reviewer has mentioned correctly, besides the reported features there are various indicators of network topology such as centrality, deformation ratio, robustness, etc, and we do have the same concern with the reviewer in this regard. Thus, among all network indicators calculated, we chose those reported in the manuscript based on the following three reasons. First, they are in line with our research question and the results we discussed. For example, modularity provides information on the networks’ communities, which is a very crucial feature in gene network studies 3 . As such, through computing the clustering coefficient we observe networks’ tendency to form triads, which are the basic building blocks in researches based on the balance theory. As well, the assortativity (disassortativity) indicator denotes the tendency of configurations to be connected with similar (different) counterparts. Second, some of the indicators have conceptual overlap with the studied features. For example, by calculating the mean degree we can have a sense of centrality. Third, other indicators such as robustness are useful for other research questions yet not directly related to our study. However, as the respected reviewer has mentioned, we calculated and added another useful feature to the revised manuscript, as below:Small world propensity ( φ ): To quantify the extent to which a network displays a small-world structure, the Small-World Propensity, φ , is defined as below. ∆ C and ∆ L in Eq (4) show the deviation of clustering coefficient and path length, that are calculated as ∆ C = C latt −C / C latt −Crand and ∆ L = L − Lrand / L latt −Lrand , respectively,φ = 1 −√ ∆ C 2 + ∆ 2 / 2 (4)Moreover, the following two indicators has been calculated as well, but not reported: Number of hubs (NHUBS): Nodes with degrees that exceed the average degree of the network are considered as hubs. Through this measurement according to Eq (5), not much more information would be presented in line with our aim. Thus, it is not reported in our study. Also, we were looking for genes that have a high degree of balance in the structure, not just those with high degrees.NHUBS = ∑ [k i > ⟨k⟩]. (5)Synchronizability (S): This feature expresses the network’s power to synchronize and is calculated from the eigenvalues of the graph’s Laplacian matrix (Λ = D − A). In Eq (6), D defines the diagonal matrix containing the nodal degrees. As well, A is the unsigned adjacency matrix. The (S) is defined as the ratio between the first non-zero eigenvalue λ 2 and the largest eigenvalue λ max of Λ. While this indicator seems to prepare nice information, its insight as spectral analysis is far from our approach which is beyond pairwise interactions.S =λ 2/ λ max (6)The results of these features are presented in Table 1.Two other network’s features,Number of hubs:Essential 508(48.8462%)Nonessential 2131(48.1038%)Synchronizability:Essential 0.3127Nonessential 0.1759Thus, at the beginning of the “network analysis” subsection of the method section (page 4), we highlighted these informative, well-known, and standard indicators as has been similarly done by Barabási in his highly cited and seminal work. Besides, we added a new paragraph at the beginning of the result section on page 8 to explain the reason of selecting these indicators. Moreover, Fig 2 and Table 2 are modified regarding adding the small-worldness propensity ( φ ) feature.5. Findings: In table 4, there’s an index value of 0 for genes-K shuffled cell. Can you explain the meaning of an index of 0 here?Response: Thanks to the reviewer. Since the value of K Shu f f led is too small (i.e., 2.737e − 12 with mean = 2.952e − 12 and std = 5.631e − 13), it is displayed as 0.000. Specifically, according to Eq (7), K is a measure of the main difference that arises in the quantification of how close to balance an unbalanced network is,K = trace exp A(Σ) / trace exp A(|Σ|) (7)Moreover, for walks with all lengths, when we open the expansion of the signed matrix exp A(Σ) in Eq (7), among walks with all lengths, there can be some negative terms in the nominator, which represent an unbalanced walk. Though, in the denominator, all terms of expansion of the unsigned matrix exp A(|Σ|) are positive. Therefore, the walk balance index (K) flows between zero and one. Otherwise, if all present walks are positive, then this index meets its maximum value, which is one. The value closer to one implies more balance in the network, while close to zero is related to a largely unbalanced network. For example, when we shuffle the network, all structure is missing. Thus, it is inferable that when we have a random network, the walk balance index calculating the amount of balance in the network has the lowest possible value, that is, zero. For further clarification of this point in the manuscript, we explained this notion more precisely within the last paragraph of the method section, on page 7, lines 275-282, and the last paragraph of the result section on page 11 lines 398-401. Besides that K is nearly zero for the shuffled of the essential network, its value for the shuffled version of the nonessential network is 0.131. This minor difference here is due to the different percentages of positive (negative) links in both networks, which are also preserved in their corresponding shuffled networks. To be specific, the percentage of positive and negative links in the essential gene network are nearly equal ( 50.1% and 49.9% for positive and negative links, respectively). Thus, through shuffling this network, the equality between the signs results in a walk balance index too close to zero. Even though, in the nonessential gene network, the percentage of positive and negative links are different from each other. That is, there are 63.5% positive links, whereas the percentage of negative links is 36.4%.6. Findings: Given the unequal sizes of the genes vs. nonessential genes network, with the nonessential genes network being much bigger, that could explain the higher ’likelihood’ for finding balanced configurations for this particular network. Do you consider controlling for the size of the networks, and/or normalization techniques to mitigate the sampling/size difference here?Response: We thank the reviewer greatly for the comment. As the respected reviewer has mentioned, the size of the nonessential gene network is bigger than the essential one. But the fact is that there is more sparsity within the nonessential gene network compared to the essential network (Table 2). That is, the fraction of the number of actual links over the number of all possible ones in the nonessential gene network is 0.162, whereas in the essential gene network it is 0.461. Moreover, the density of triads in the nonessential gene network is less than the essential gene network. Specifically, the fraction of the number of triads over the number of all possible ones in the nonessential gene network is 0.006, yet in the essential gene network, it is 0.109. Therefore, it could explain the lower "likelihood" for finding balanced configurations for the nonessential gene network. In analyzing the balanced and unbalanced configurations for essential and nonessential networks according to the framework of structural balance theory, we compare each only with its corresponding shuffled network. As a result, we observe that balanced and unbalanced triads are over and underrepresented in both networks compared with the corresponding shuffled networks with the exact similar size for each. As such, Leskovec, according to Fig 3 in his highly referenced study, has performed this analysis for three different size networks, each with their shuffled ones 6 . As can be seen in Fig 3 (Table 1), there are three networks with different sizes, namely, Opinions, Slashdot, and Wikipedia. Then, in Table 2, p(T i ) and p 0 (T i ) are introduced as the probability of each type of triad in original and shuffled networks, respectively. Finally, in Table 3, the result of these quantities, which are independent of the networks’ sizes, are reported. In other words, these networks are compared with their shuffled corresponding ones, which have the same sizes, not each other.Figure 3. Comparison of three different size networks, each with their shuffled ones in terms of all four types of triads 6 .About our analysis on walks with all lengths, it is worth mentioning that the (K) as the density of balanced walks with all lengths with values between zero and one, is independent of size. Therefore, the greater this index for a network is, the more balanced it is. Besides that, similar to what has been concluded while considering triads, through analyzing the walk balance index (K), both networks are more balanced than their corresponding shuffled ones. Estrada has performed the same comparison in his study 4 . Therefore, we have completed all three sentences in the manuscript where being more balanced of nonessential than the essential network is told. We applied this important point within the last sentence of the abstract section, on page 1, the result section, on page 11, lines 401-405, and the discussion section, on pages 12 and 13, lines 446-449. At last, especially thanks to the respected reviewer, according to normalized indicators which are dependent on size (clustering coefficient and average path length) in the revised manuscript, we have added three new paragraphs, including three new equations and references to the “network analysis” subsection of the method section, on page 4 and 5. Furthermore, the second paragraph of the result section on page 8, Fig 2, and Table 2 are modified as well.References1. Moradimanesh Z, Khosrowabadi R, Eshaghi Gordji M, Jafari GR. Altered structural balance of resting state networks in autism. Scientific Reports. 2021; 11:1966. https://doi.org/10.1038/s41598-020-80330-0 [Ref 43 in the manuscript]2. http://boonelab.ccbr.utoronto.ca/supplement/costanzo2016/3. Costanzo M, et al. A global genetic interaction network maps a wiring diagram of cellular function. Science. 2016; 353:6306. https://doi.org/10.1126/science.aaf1420 PMCID: PMC5661885 [Ref 48 in the manuscript]4. Estrada E, Benzi M. Walk-based measure of balance in signed networks: Detecting lack of balance in social networks. Physical Review E. 2014; 90:042802. https://doi.org/10.1103/PhysRevE.90.042802 PMCID: 25375544 [Ref 47 in the manuscript]5. Barabasi AL, Oltavi ZN. Network biology: understanding the cell’s functional organization. Nature Reviews Genetics. 2004; 5(2):101-113. https://doi.org/10.1038/nrg1272 PMID: 14735121 [Ref 1 in the manuscript]6. Leskovec J, Huttenlocher D, Kleinberg J. Signed networks in social media. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. 2010; 22:13611370. https://doi.org/10.1145/1753326.1753532 [Ref 3 in the manuscript]Submitted filename: Response to Reviewers.pdfClick here for additional data file.10 Mar 2022The structure balance of gene-gene networks beyond pairwise interactionsPONE-D-21-31371R1Dear Dr. Jafari,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Note that one reviewer has a minor suggestion with regard to weak balance theory that you may choose to incorporate.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,Bryan C DanielsAcademic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:Reviewer's Responses to Questions
Comments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressedReviewer #2: All comments have been addressed********** 2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: YesReviewer #2: Yes********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: YesReviewer #2: Yes********** 4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: YesReviewer #2: Yes********** 5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: YesReviewer #2: Yes********** 6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response)Reviewer #2: Thank you for addressing my comments and making the appropriate changes to the manuscript.One minor suggestion in terms of conceptualization of strongly vs. weakly balanced triads: while you have cited Davis' work which is foundational to the understanding SBT, I believe the triad T0 (- - -) should actually be a weakly balanced triad (under Davis' Weak Balance Theory), as opposed to weakly unbalanced. Thus, with this definition, triads T0, T1, and T3 are all balanced, and triad T2 (+ + - ) is the only unbalanced triad type. I would recommend you to revisit the tenets of Weak Structural Balance Theory (WSBT) if you would still want to include this version of structural balance theory in your conceptualization.A way to ensure that you're consistent with different version of structural balance theory is to clarify what properties of balance you are considering for this specific network context, and your own rationale for conceptualizing "strong" vs. "weak" balance in this manner.********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: NoReviewer #2: No21 Mar 2022PONE-D-21-31371R1The structure balance of gene-gene networks beyond pairwise interactionsDear Dr. Jafari:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. Bryan C DanielsAcademic EditorPLOS ONE
Authors: Zhijian Li; Franco J Vizeacoumar; Sondra Bahr; Jingjing Li; Jonas Warringer; Frederick S Vizeacoumar; Renqiang Min; Benjamin Vandersluis; Jeremy Bellay; Michael Devit; James A Fleming; Andrew Stephens; Julian Haase; Zhen-Yuan Lin; Anastasia Baryshnikova; Hong Lu; Zhun Yan; Ke Jin; Sarah Barker; Alessandro Datti; Guri Giaever; Corey Nislow; Chris Bulawa; Chad L Myers; Michael Costanzo; Anne-Claude Gingras; Zhaolei Zhang; Anders Blomberg; Kerry Bloom; Brenda Andrews; Charles Boone Journal: Nat Biotechnol Date: 2011-03-27 Impact factor: 54.908
Authors: Michael Costanzo; Benjamin VanderSluis; Elizabeth N Koch; Anastasia Baryshnikova; Carles Pons; Guihong Tan; Wen Wang; Matej Usaj; Julia Hanchard; Susan D Lee; Vicent Pelechano; Erin B Styles; Maximilian Billmann; Jolanda van Leeuwen; Nydia van Dyk; Zhen-Yuan Lin; Elena Kuzmin; Justin Nelson; Jeff S Piotrowski; Tharan Srikumar; Sondra Bahr; Yiqun Chen; Raamesh Deshpande; Christoph F Kurat; Sheena C Li; Zhijian Li; Mojca Mattiazzi Usaj; Hiroki Okada; Natasha Pascoe; Bryan-Joseph San Luis; Sara Sharifpoor; Emira Shuteriqi; Scott W Simpkins; Jamie Snider; Harsha Garadi Suresh; Yizhao Tan; Hongwei Zhu; Noel Malod-Dognin; Vuk Janjic; Natasa Przulj; Olga G Troyanskaya; Igor Stagljar; Tian Xia; Yoshikazu Ohya; Anne-Claude Gingras; Brian Raught; Michael Boutros; Lars M Steinmetz; Claire L Moore; Adam P Rosebrock; Amy A Caudy; Chad L Myers; Brenda Andrews; Charles Boone Journal: Science Date: 2016-09-23 Impact factor: 47.728