Literature DB >> 33850945

Molecular Network Analysis in Rabies Pathogenesis Using Cooperative Game Theory.

Abstract

BACKGROUND AND
PURPOSE: Recently, many researchers from different fields of science have been used networks to analyze complex relational big data. The identification of which nodes are more important than the others, known as centrality analysis, is a key issue in biological network analysis. Although, several centralities have been introduced degree, closeness, and betweenness centralities are the most popular. These centralities are based on the individual position of each node and the cooperation and synergies between nodes have been ignored.
OBJECTIVES: Since in many cases, the network function is a consequence of cooperation and interaction between nodes, classical centralities were extended to a group of nodes instead of only individual nodes using cooperative game theory concepts. In this study, we analyze the protein interaction network inferred in rabies disease and rank gene products based on group centrality measurements to identify the novel gene candidates.
MATERIALS AND METHODS: For this purpose, we used a game-theoretic approach at three scenarios, where the power of a coalition of genes assessed using different criteria including the neighbors of genes in the network, and predefined importance of the genes in its neighborhood. The Shapley value of such a game was considered as a new centrality. In this study, we analyze the network of gene products implicates rabies. The network has 1059 nodes and 8844 edges and centrality analysis was performed using CINNA package in R software.
RESULTS: Based on three scenarios, we selected genes among the highest Shapley value that had low ranking from classical centralities. The enrichment analysis among the selected genes in scenario 1 indicates important pathways in rabies pathogenesis. Pair-wise correlation analysis reveals that changing the weights of nodes at different scenarios can significantly affect the results of ranking genes in the network.
CONCLUSION: A prior knowledge about the disease and the topology of the network, enable us to design an appropriate game and consequently infer some biological important nodes (genes) in the network. Obviously, a single centrality cannot capture all significant features embedded in the network. Copyright:

Entities: Chemical

Keywords: Centrality analysis; Cooperative game theory; Protein interaction network; Shapley value

Year: 2020 PMID： 33850945 PMCID： PMC8035416 DOI： 10.30498/IJB.2020.2551

Source DB: PubMed Journal: Iran J Biotechnol ISSN： 1728-3043 Impact factor: 1.671

1. Background

Network analyses have been widely used in different fields of science such as biology, physics, and social sciences. In many real networks, some specific nodes play more important roles than the other nodes ( 1 ). Centralities are functions which assign a real number to each node based on its special characteristic and accordingly rank network nodes. Different centralities have been introduced, meanwhile, degree centrality, closeness centrality, and betweenness centrality are the popular detailed measurements. Degree centrality ( 2 ) is a simple centrality that counts the number of edges for each node. Closeness centrality ( 3 ) quantifies the mean distance from a node to all other nodes of the network. Betweenness centrality ( 4 ) measures the number of times a node lies on the shortest path between the other nodes. Centrality analysis of biological networks including co-expression, signaling and protein-protein interaction networks were successfully used to identify important genes or proteins in the very complex system of pairwise relations in the network ( 5 ). Application of centrality measures for the analysis of biological networks can be found in the papers ( 6 , 10 ). In classical centralities, the cooperation and synergies between nodes are not considered while in some cases the function of the network is highly dependent on the cooperation and interaction between groups of nodes ( 11 ). Everett and Borgatti introduced the concept of group centrality as an extension of classical centrality measures that consider a group of nodes instead of only individual nodes ( 12 ). As an example, consider the sample network of 9 nodes in Figure 1A. Let nodes represent important places in a city and edges are communications routes between them. How can we determine two police stations, so that they can cover most of the network? Choosing 1 and 5 with highest degrees is an option that covers 6 nodes, while 5 and 7 support the whole network. This example shows that, although 7 has lower degree centrality than 1, joint degree centrality of {5, 7} is a more appropriative approach for the important node selection. Recently, Cesari et al. applied game theory concept and the Shapley value to assess the relevance of each gene in interaction with the others in the co-expression network ( 13 , 14 ). We used similar ideas to analyze the protein-protein interaction network of gene products implicate rabies constructed by Jamalkandi et al ( 15 ).

2. Objective

In this paper, we briefly review game-theoretic network centrality and apply three scenarios on the protein- protein interaction network inferred in rabies disease. Each scenario tries to capture some topological features of the network and presents some new gene products that cannot be identified by other methods.

3. Materials and Methods

In game theory, cooperative game studies the behavior of players that can form coalitions to attain higher or lower scores than the sum of each of those players working individually. In fact, cooperation game theory addresses the synergy among groups of players. More formally, a cooperative game consists of a set of players N ={1,2, ..., n} and a characteristic function v: 2I → ℝ that assigns a real value to each coalition S ⊆ N representing its value (pay off) ( 16 ). It assumes that v(Ø)= 0 and v(S∪T)≥ v(S)+ v(T) for any coalition S and T such that S ∩ T = Ø . A cooperative game denoted by G= (N, v ). The big challenge is how to fairly allocate and divide the total payoff of v(N) among the players. A solution concept is a vector θ(v)= (θ1(v),...,θn(v)) that represents the allocation to each player. In fact θi(v) is the value or power of i-th player in the game. Shapley value is one of the most popular solution concepts in cooperative game theory that assigns a single value for each player, which is the average of all marginal contributions to all possible coalitions and comply with the following principles which are known as Shapley principles (17): a- Efficiency: all payoffs have to be completely distributed among the players’ i.e. b- Symmetry: this principle indicates that if two players played equal roles in the payoff obtained by the coalition, they should enjoy that payoff to the same extent. That is, for two players i and j and for any coalition such as S consisting of i and j, if v(S∪{i})=v(S∪{j}) then θi(v)=θj(v). c- Dummy axiom: if there exists a player i for any coalition S without i, such that v(S) = v(S ∪{i}), then θ(v)= 0. That is, a player whose presence and absence in a coalition is the same, will have no share in the coalition payoff. d- Additivity: if u and v were characteristic functions, then θ(v + u) =θ(v) + θ(u) It can be proven that Shapley value is unique and is as follows(18): (1) Let be a network consists of a set of nodes N, and set of edges between nodes E. In this paper, we study the protein interaction network of infected cells to elucidate the rabies implicated signal transduction network. Here, E is a set of gene products and the set of edges E describes the interaction among them. The network constructed using meta-analyzing whole-transcriptome microarray datasets of the CNS infected by strain CVS- 11, and integrating them with interactome data using computational and statistical methods (15). According to Cesari et al. (13, 14), we consider the coalitional game G = (N,v) , where N is the set of gene products in the network and the characteristic function v is as follows:M A worth to each coalition of gene products S ⊆ N is defined by: (2) Where Nei(S) is the gene products that are directly connected to some nodes in S. aj represents the predefined importance of the gene product in the network. We will analyze the network using three scenarios. In the first scenario, all nodes are under similar conditions without any precondition and therefore, for all nodes aj is equal to 1. In the second one, we assign weights to nodes with an n-tuple vector (n is the number of nodes in the network). These weights generally indicate the importance of the nodes in the network and can be assigned using different experimental and computational approaches. Here, we assign weights using the following procedure. At first, the network was clustered using the Cluster ONE algorithm, and then the number of nodes in each cluster was considered as the weight of that cluster nodes. Cesari et al. (14) have shown that the Shapley value of this coalitional game can be calculated according to the following equation: (3) where aj is the weighting vector and dj is the number of neighbors of each node. Naturally, all weights in the first scenario would be 1 and all weights in the second method would be equal to the number of nodes in each cluster to which the node belongs. According to equation (3), a node connected to a high number of low degree nodes attains a high Shapley value. In the third scenario, the value of each coalition was defined as the number of coalition members plus their neighbors with more than one distance as described by Michalak et al (16). In this case, the characteristic function is defined by: (4) and the Shapley value is calculated as follows (16): (5) where Nei(i, k ) is the neighbors of node i at distance k and deg( j, k ) is the degree of node j at distance k. Example 1: Consider the sample network represented in Figure 1A. According to the first scenario, the value of each coalition is a number of all unique nodes reachable from the coalition in at most one step. For example v({1,5}) = |{1,5}| + |{2,3, 4,6}| = 6. According to equation 3, the Shapley value vector is:

Figure 1

Sample network. A) The nodes 1 and 5 have highest degree centrality while nodes 5 and 7 cover the whole network. B) The nodes 4 and 8 have a similar degree, but node 8 attains higher Shapley value than 4. C) The hub nodes (nodes 5 and 1) are not directly connected to each other. D) The hub nodes (nodes 2 and 4) are directly connected to each other. The color intensity of each node represents the value of the Shapley value. Nodes v5 and v7 attain the highest Shapley value. The degree, closeness and betweenness centralities are as follows: Degree = (3, 1, 2, 2, 5, 2, 3, 1, 1) Closeness = (0.44, 0.4, 0.42, 0.42, 0.61, 0.57, 0.47, 0.33, 0.33) Betweenness = (0.01, 0, 0, 0, 0.69, 0.53, 0.46, 0, 0) Therefore, the Shapley value gives different ranking to 5 and 7 compared with classical centralities. Note that the Shapley value of a node that interacts with low degree nodes is higher than the one whose neighbors have degrees greater than 1. Consider the following example. Example 2: A sample network with 10 nodes is shown in Figure 1B. Nodes 4 and 8 have a similar degree and symmetric position in the network. Suppose again that all the nodes have the same prior weight equal to 1, i.e. a1 =1, ∀ i∈N. The Shapley value vector for this network is as follows: Node 8 that connected to nodes who themselves have a low degree gets higher Shapley value than the node 4. Therefore, removing the node with highest Shapley value would split the network to a maximum number of connected components and isolated nodes. Example3: Consider two sample networks in Figures 1C and 1D. In both cases, hub nodes have three neighbors. Shapley value for Figures 1C and 1D are and , respectively. The hubs in the network in Figure 1C are not directly connected to each other while in Figure 1D hubs are neighbors and this decreased their Shapley value. Since in Shapley value the number of neighbors is very important, whether the hubs are neighbors or not is very effective in Shapley value. In the network we will analyze, the average of the distance between hubs is approximately 2. In order to diminish the effect of hubs in the Shapley value, we used scenario 3.

4. Results

In this study, we analyze the network of gene products implicate rabies. The network constructed by Jamalkandi et.al ( 15 ) by integrating the results of meta-analyzing whole transcriptome microarray data sets of the CNS infected by strain CVS-11 with interaction data using the computational method. In fact, they reconstructed a weighted protein-protein interaction network of infected cells based on differentially expressed genes to clarify the rabies-implicated signal transduction network. The weights are STRING combined scores ( 19 ). We selected the edges with a score greater than 0.4 to strictly filter weak and false positive interactions. The giant component of the network was obtained using CINNA package ( 20 ) with 1059 nodes and 8844 edges. In what follows, we analyze the network using three scenarios introduced in the Materials and Methods section.

4.1.First Scenario

In the first scenario, a group of gene products consists of its members plus their neighbors. The classical centralities including degree, closeness and betweenness and the Shapley value of associated game were calculated. The correlation coefficient, scatter plot, and distribution of centrality values between pairs of centralities are shown in Figure 2. Betweenness and Shapley value are highly correlated, meanwhile, the other pairwise correlations are not significantly correlated.

Figure 2

Scatter plots, distributions and the correlation coefficient between centralities in the first scenario.

Scatter plots, distributions and the correlation coefficient between centralities in the first scenario. In order to assess the effect of removing the top-ranked nodes on the network, the top twenty nodes according to betweenness centrality and Shapley value centrality were selected and removed from the network. In the first case (removing the nodes with highest betweenness centrality), the network was split into 106 components; the giant component with 909 nodes and the remaining components include 11, 5, 3, 2 nodes, and 94 single nodes. But, the network obtained by the removing the top 20 nodes according to Shapley value centrality was split into 166 components; the giant component with 953 nodes and the remaining components consist of 5, 4, 3, 2 nodes, and 151 single nodes. Therefore, high Shapley value nodes are able to interact directly with the maximum number of other nodes in the network and their removal would split the network into several connected components with few gene products or even isolated nodes. In order to elucidate the beneficial role of the Shapley value centrality, we selected the genes among the highest Shapley value centrality ranks that had a low ranking from classical centralities. Three of the 40 top Shapley value-ranked genes, including TPK1, BMP2, and SMAD4 were not among the 80 top-ranked genes from classical centralities (degree, closeness, and betweenness).

4.2. Second Scenario

In this scenario, a weight vector was calculated that assigns a score for each node. For this purpose, at first, the network was divided into clusters using the ClusterONE algorithm ( 21 ). Genes can belong to multiple clusters and the weight of each gene in the network was defined as the maximum order of clusters it belongs to. The pair-wise correlations between any two of the centralities are shown in Figure 3. In this scenario, the correlation coefficient between degree centrality and Shapley value centrality was raised to 0.729, and the correlation coefficient between betweenness and Shapley value decreased to zero. Similar to the first scenario, we picked the genes among the highest Shapley value centrality ranks that had a low ranking from classical centralities. Four of the 40 top Shapley value-ranked genes, including IFI44, SLFN5, RTP4, and IRGM2 were not among the 130 top-ranked genes from classical centralities (degree, closeness, and betweenness). The top twenty nodes according to Shapley value centrality were selected and removed from the network. The network was split into 43 components; the giant component with 986 nodes and the remaining components consist of 5, 3, and 2 nodes, and 35 single nodes. The number of connected components compared to scenario 1 was significantly decreased and this could be a consequence of the weight of nodes based on the order of maximum cluster.

Figure 3

Scatter plots, distributions and the correlation coefficient between centralities in the second scenario.

4.3. Third Scenario

The average of distances between hubs in the network was approximately 2. Therefore, in the third scenario, the value of each coalition was defined as the number of coalition members plus their neighbors with more than one distance. In fact, each coalition consists of nodes, their neighbors and neighbor of its neighbors. We expect that this coalition definition, reduce the correlation coefficient between degree centrality and Shapley value (Fig. 4).

Figure 4

Scatter plots, distributions and the correlation coefficient between centralities in the third scenario.

Scatter plots, distributions and the correlation coefficient between centralities in the third scenario. The correlation coefficient of this centrality with closeness and betweenness centralities was about 0.60, while as we expected at this scenario the correlation coefficient between degree and Shapley value decreased. Two of the 40 top Shapley value-ranked genes, including MAML1 and CITED2, were not among the 80 top-ranked genes from classical centralities (degree, closeness, and betweenness).

5. Discussion

In the first scenario, a coalition consists of its members and their neighbors and as explained in the Materials and Methods section a node connected to low degree nodes attains a high Shapley value. Table 1 compares the ranking of three selected genes in scenario 1 according to different centrality measures. For each of the three selected genes in the first scenario, the average degree of its neighbors is very low as we expected.

Table 1

The ranking of three selected gene in scenario 1 according to different centrality measures.

Gene Name	Shapley value Degree rank	Degree rank	Closeness rank	Betweenness rank	The average degree of its neighbors
TPK1	16	389	721	82	5.06
BMP2	32	465	966	113	5.09
SMAD4	36	262	111	120	16.52

The ranking of three selected gene in scenario 1 according to different centrality measures. To undertake enrichment analysis among the selected genes in scenario 1 (TPK1, BMP2, and SMAD4) Enrichr (22, 23) was used based on KEGG 2019 human. The results are listed in Table 2.

Table 2

The enriched KEGG pathways using three selected genes in scenario 1.

KEGG Pathway	Adjusted P-Value
TGF-beta signaling pathway	0.001
Hippo signaling pathway	0.001
Thiamine metabolism	0.011
Pathways in cancer	0.011
Basal cell carcinoma	0.027
Th17 cell differentiation	0.027
Colorectal cancer	0.027
Chronic myeloid leukemia	0.027
Pancreatic cancer	0.027
Adherens junction	0.027
AGE-RAGE signaling pathway in diabetic complications	0.027
Signaling pathways regulating pluripotency of stem cells	0.027
Cell cycle	0.027
Wnt signaling pathway	0.027
Apelin signaling pathway	0.027
FoxO signaling pathway	0.027
Hepatocellular carcinoma	0.027
Hepatitis B	0.027
Gastric cancer	0.027
Human T-cell leukemia virus 1 infection	0.034
Cytokine-cytokine receptor interaction	0.043

Underlined/bold pathways are reported by Azimzadeh Jamalkandi et al (15)

The enriched KEGG pathways using three selected genes in scenario 1. Underlined/bold pathways are reported by Azimzadeh Jamalkandi et al (15) In this paper, we analyze the protein-protein interaction network implicates rabies with the objective of identifying the novel gene candidates acting as intermediaries between hub nodes and leaf nodes. For this purpose, we used a game-theoretic approach at three scenarios, where the power of a coalition of genes assessed using different criteria including the neighbors of genes in the network, and predefined importance of the genes in its neighborhood and the Shapley value of such a game was considered as a new centrality. Pair-wise correlation analysis reveals that changing the weights of nodes at different scenarios can significantly affect the results of ranking genes in the network. Therefore, some prior knowledge about the disease and the topology of the network, enable us to design an appropriate game and consequently infer some biological important nodes (genes) in the network. Obviously, a single centrality cannot capture all significant features embedded in the network.

15 in total

1. Exploration of biological network centralities with CentiBiN.

Authors: Björn H Junker; Dirk Koschützki; Falk Schreiber
Journal: BMC Bioinformatics Date: 2006-04-21 Impact factor: 3.169

2. On the centrality in a graph.

Authors: J Nieminen
Journal: Scand J Psychol Date: 1974

3. Human T-lymphotropic virus 1 (HTLV-1) pathogenesis: A systems virology study.

Authors: Sayed-Hamidreza Mozhgani; Mohadeseh Zarei-Ghobadi; Majid Teymoori-Rad; Talat Mokhtari-Azad; Mehdi Mirzaie; Mohsen Sheikhi; Seyed-Mohammad Jazayeri; Ramin Shahbahrami; Hedayatollah Ghourchian; Mohieddin Jafari; Seyed-Abdolrahim Rezaee; Mehdi Norouzi
Journal: J Cell Biochem Date: 2018-01-19 Impact factor: 4.429

Review 4. Signaling network of lipids as a comprehensive scaffold for omics data integration in sputum of COPD patients.

Authors: Sadegh Azimzadeh Jamalkandi; Mehdi Mirzaie; Mohieddin Jafari; Hossein Mehrani; Parvin Shariati; Mahvash Khodabandeh
Journal: Biochim Biophys Acta Date: 2015-07-26

5. STRING v10: protein-protein interaction networks, integrated over the tree of life.

Authors: Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering
Journal: Nucleic Acids Res Date: 2014-10-28 Impact factor: 16.971

6. Systems Biomedicine of Rabies Delineates the Affected Signaling Pathways.

Authors: Sadegh Azimzadeh Jamalkandi; Sayed-Hamidreza Mozhgani; Hamid Gholami Pourbadie; Mehdi Mirzaie; Farshid Noorbakhsh; Behrouz Vaziri; Alireza Gholami; Naser Ansari-Pour; Mohieddin Jafari
Journal: Front Microbiol Date: 2016-11-07 Impact factor: 5.640