Literature DB >> 34902553

Computational modeling of human-nCoV protein-protein interaction network.

Sovan Saha¹, Anup Kumar Halder², Soumyendu Sekhar Bandyopadhyay³, Piyali Chatterjee⁴, Mita Nasipuri⁵, Subhadip Basu⁶.

Abstract

Novel coronavirus(SARS-CoV2) replicates the host cell's genome by interacting with the host proteins. Due to this fact, the identification of virus and host protein-protein interactions could be beneficial in understanding the disease transmission behavior of the virus as well as in potential COVID-19 drug identification. International Committee on Taxonomy of Viruses (ICTV) has declared that nCoV is highly genetically similar to the SARS-CoV epidemic in 2003 (∼89% similarity). With this hypothesis, the present work focuses on developing a computational model for the nCoV-Human protein interaction network, using the experimentally validated SARS-CoV-Human protein interactions. Initially, level-1 and level-2 human spreader proteins are identified in the SARS-CoV-Human interaction network, using Susceptible-Infected-Susceptible (SIS) model. These proteins are considered potential human targets for nCoV bait proteins. A gene-ontology-based fuzzy affinity function has been used to construct the nCoV-Human protein interaction network at a ∼99.98% specificity threshold. This also identifies 37 level-1 human spreaders for COVID-19 in the human protein-interaction network. 2474 level-2 human spreaders are subsequently identified using the SIS model. The derived host-pathogen interaction network is finally validated using six potential FDA-listed drugs for COVID-19 with significant overlap between the known drug target proteins and the identified spreader proteins.

Entities: Chemical

Keywords: Drug target; Fuzzy model; Gene ontology; High-quality interactions; Human-nCoV interactions; Protein interaction network analysis; SARS-CoV2; Spreadability index; Spreader nodes; Susceptible-infected-susceptible model

Mesh：

Substances：
Proteins
RNA, Viral

Year: 2021 PMID： 34902553 PMCID： PMC8662836 DOI： 10.1016/j.ymeth.2021.12.003

Source DB: PubMed Journal: Methods ISSN： 1046-2023 Impact factor: 4.647

Introduction

COVID-19 evolved in the Chinese city of Wuhan (Hubei province) [1]. The first case of human species affected by nCoV was observed on 31st December 2019 [2]. Soon it expands its adverse effect on almost all nations within a brief period [3]. World Health Organization (WHO) observes that the massive disastrous outbreak of nCoV is mainly due to mass community spreading and declares a global health emergency on 30th January 2020 [4]. After proper assessment, WHO presumes its fatality rate to be 4% [5] which urges the global researchers to work together to discover an appropriate treatment for this pandemic [6], [7]. Coronaviridae is the family to which a coronavirus belongs. It also infects birds and mammals besides affecting human beings. Though the common symptoms of the coronavirus are common cold, cough, etc., it is accompanied by severe acute, chronic respiratory disease and multiple organ failure leading to human death. Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS) were the two major outbreaks in 2003 and 2012 before SARS-CoV2. The source of origin of SARS was located in Southern China. Its fatality rate was within 14%–15% [8], due to which 774 people lost their lives among 8804 affected cases. Saudi Arabia was marked as the base for the commencement of MERS. 858 persons among 2494 infected cases were defeated in their battle against the MERS virus. Thus it generated a much higher fatality rate of 34.4% [9] when compared to that of SARS. All three epidemic creators SARS, MERS, and SARS-CoV2, are biologically included in the genus Beta coronavirus under the Coronaviridae. Both structural and non-structural proteins are involved in the formation of SARS-CoV2. Out of the two, structural proteins like the envelope (E) protein, membrane (M) protein, nucleocapsid (N) protein and the spike (S) protein play a significant role in transmitting the disease by binding with the receptors after entering into the human body [10]. So, there is an urgent need to understand and analyze the mechanism of disease transmission of this new virus. In this research work, Protein-Protein Interaction Networks (PPIN) are the most significant attribute to study the disease propagation mechanism from SARS-CoV2 to humans. It plays a crucial role in identifying essential proteins [11], [12], [13], [14], [15] responsible for various diseases. They are also significant in identifying protein functions [16], [17], [18], [19]. According to Lotem et al. [20], though human PPIN is constantly expanding, very little information is available about the human PPIN, which gets generated in disease conditions. With the enhancement in the availability of the human PPIN data, the primary focus of research has been shifted from the basic understanding of the PPIN to the study of the PPIN underlying various kind of human diseases [21]. According to the work of Ideker et al. [22], PPIN study mainly falls under four categories: 1) Identification of human disease genes based on network analysis, 2) Implication of additional genes involved in the disease by using protein networks, 3) Identification of protein subnetworks involved in diseases and 4) Classification of case-control studies based on protein PPIN. Host-pathogen PPINs are significant for understanding the mechanism of transmission of infection, which is essential for developing new and more effective therapeutics, including rational drug design. Progression of infection and disease results due to the interaction of proteins in between pathogen and host. Pathogen plays an active role in spreading infection. Pathogen and host PPIN permit pathogenic microorganisms to utilize host capabilities by manipulating the host mechanisms to abscond from the host's immune responses [23], [24], [25]. Detection of target proteins, through the analysis of pathogen and host, PPIN is the central point of research [13], [26], [27]. Topologically significant proteins having a higher degree of interactions are generally found to be an important drug targets. However, proteins with fewer interactions or topologically not substantial may be involved in the mechanism of infection because of some biological pathway relevance. However, clinically validated Human-nCoV protein interaction data is limited in the current literature. This has motivated us to develop a new computational model for the nCoV-Human PPI network. We have subsequently validated the proteins involved in the host-pathogen interactions with respect to potential Food and Drug Administration (FDA) drugs for COVID-19 treatment. Key aspects of this research work are highlighted below: It has been reported that SARS-CoV has ∼89% [28], [29] genetic similarities with nCoV. SARS-CoV-Human protein–protein interaction network has also been studied widely and available in the literature [30], [31], [32]. Recently, we developed a computational model to identify potential spreader proteins in a HumanSARS CoV interaction network using the SIS model [14]. In addition, sequence information of 29 nCoV proteins has been released [33]. Gene ontological (GO) information (Biological Process (BP), Molecular Function (MF), Cellular Component (CC)) of 14 of the nCoV proteins are available [33], [34]. We have recently developed a method to predict interaction affinity between proteins from the available GO graph [35]. Assessment of interaction affinity between nCoV proteins with potential Human target/bait proteins, which are susceptible to SARS-CoV infection, has been done. Fuzzy affinity thresholding is done to detect High-Quality nCoV-Human PPIN. The selected human proteins are considered as level-1 human spreader nodes of nCoV. Level 2 spreader nodes in nCoV-Human PPIN are detected using the spreadability index and validated by SIS [14], [36] model. Our developed model is validated for the target proteins of the potential FDA drugs for COVID-19 treatment [37]. All the related terminologies referred in this work like Ego Network [38] (please see Fig S1 in the supplementary document), spreadability index [14], Node weight [39], Edge ratio [38], Neighborhood density [38], Betweenness Centrality (BC) [40], Closeness centrality (CC) [41], Degree centrality (DC) [42] and Local average centrality (LAC) [43] are discussed in the supplementary document.

Methodology

Our developed computational model for nCoV-Human PPIN consists of two crucial methodologies 1) identification of spreader nodes by spreadability index along with the validation of SIS model and 2) Fuzzy PPI model. First, the SIS model identifies spreader nodes of SARS-CoV proteins (candidate set of nCoV interactors). Then, the Fuzzy PPI model is applied to extract the nCoV-Human interactions, and finally, nCoV spreaders are identified using the SIS model.

Identification of spreader nodes by spreadability index along with the validation by SIS model

In nCoV-Human PPIN, the former acts as a pathogen/bait while the host, the human, acts as ‘Prey’. The transmission of infection starts when a pathogen enters a host body and infects its protein, affecting its directly or indirectly connected neighbor proteins. Considering this method of transmission, PPIN of humans and SARS-CoV are considered to detect spreader nodes. Spreader nodes are those nodes/proteins which transmit the disease fast among their neighbors. But not all the nodes in a PPIN are spreaders. So, proper detection of spreader nodes is crucial. Thus, spreader nodes are identified by the spreadability index, which measures the transmission capability of a node/protein. Furthermore, the compactness of PPIN and its transferal capability are evaluated using centrality analysis. Nodes with high centrality values are usually considered spreader nodes or the most critical node in a network. The spreadability index [14] is one of the centrality-based measures that combines three major topological neighborhood-based features of a network. They are 1) Node weight [39] 2) Edge ratio [38] and 3) Neighborhood density [38]. Nodes having a high spreadability index are considered spreader nodes. The spreader nodes thus identified are also validated by the SIS model [36]. The SIS model is implemented to design the SARS-CoV and SARS-CoV2 outbreak into a disease model consisting of proteins based on their present infection status. A protein can be in either of the three states: 1) S: Susceptible, which means that every protein is initially susceptible though not yet infected but at risk of getting infected by the disease. 2) I: Infected, which means that the disease already infects the protein and 3) S: Susceptible, which means proteins again become susceptible after getting recovered from the infected state. This model is implemented to generate the overall infection capability of a node after a certain range of iterations. Thus the sum of the infection capability of the top selected spreader nodes is computed by this model, which is compared against the sum obtained for the selected top critical nodes by other existing centrality measures like Betweenness Centrality (BC) [40], Closeness centrality (CC) [41], Degree centrality (DC) [42] and Local average centrality (LAC) [43] (Please see the supplementary document for more details). Our proposed method for selecting spreader nodes in SARS-CoV PPIN [14] has performed better than the other existing state-of-the-art like Betweenness Centrality (BC) [40], Closeness Centrality (CC) [41], Degree centrality (DC) [42], and Local average centrality (LAC) [43]. The comparison and results are included in our recently published work [14] (Please see Fig. S2 and Tables S1–S5 in the supplementary document for more details). The complete source code is available online here. A synthetic PPIN is considered in Fig. 1 to demonstrate the entire methodology of the spreadability index (see Table 1 ). In addition, computational analysis of the spreadability index of our proposed model with one of the other methodology LAC (computed by using CytoNCA [44]) has been highlighted in Table 2 . is the total number of edges that are outgoing from the ego network [14] (for details please see the supplementary document) whereas is denoted as the total number of interconnections in the neighbor of node [38]. of node 3 is 6 while of node 3 is 3, which highlights that node 3 has the highest transmission ability from its ego network to outside when compared to other nodes. Node 3 also has the highest spreadability index. But LAC failed to rank node 3 in the first position. The same scenario can be observed for some other nodes in the synthetic network too. Besides SIS validation result shows that the selected top-ranked spreader nodes in this proposed model have the highest infection capability compared to the other ranked nodes.

Fig. 1

Table 1

Computation of spreadability index of Fig. 1 and validation of selected top 5 spreader nodes by the SIS model.

Rank	Proteins	EoutSi	EinSi	Edge Ratio	Neighborhood Density	Node Weight	Spreadability Index	Sum of SIS infection rate of top 5 nodes
1	Node 3	6	3	1.75	6.94	2.83	14.99	1.19
2	Node 9	5	4	1.20	7.07	3.00	11.48
3	Node 6	5	2	2.00	3.93	2.60	10.46
4	Node 8	6	2	2.33	2.27	3.25	8.55
5	Node 1	5	4	1.20	4.21	3.40	8.45

Table 2

Computation of the LAC of the synthetic network (Fig. 1) and validation of selected top 5 spreader nodes by the SIS model.

Rank	Proteins	LAC	Sum of SIS infection rate of top 5 nodes
1	Node 1	2.00	0.86
2	Node 9	1.60
3	Node 5	1.33
4	Node 8	1.33
5	Node 3	1.20

Synthetic protein–protein interaction network. The network consists of 10 nodes and 25 edges. The neighborhood density (ND), Node Weight (NW), and Spreadability Index (SI) of the top 5 nodes have been highlighted. While the thickness of the edges highlights the rank according to SI, the thickness of the boundary of the nodes highlights the rank according to NW. Computation of spreadability index of Fig. 1 and validation of selected top 5 spreader nodes by the SIS model. Computation of the LAC of the synthetic network (Fig. 1) and validation of selected top 5 spreader nodes by the SIS model.

Fuzzy PPI model for potential SARS-CoV2-human interaction identification

The binding affinity between any two interacting proteins can be estimated by combining the semantic similarity scores of the GO terms associated with the proteins [26], [35], [45], [46], [47]. A greater number of semantically similar GO annotations between any protein pair indicate higher interaction affinity. The fuzzy PPI model is a hybrid approach [35] that utilizes both the topological [48] features of the GO graph and information contents [47], [49], [50] of the GO terms. GO is organized in three independent directed acyclic graphs (DAGs): molecular function (MF), biological process (BP), and cellular component (CC) [34]. The nodes in each GO graph represent GO terms, and the edges represent different hierarchical relationships. In this work, the two most essential relations, ‘is_a’ and ‘part_of,’ have been used for GO relations [51]. The semantic similarity between any two proteins is estimated by considering the similarities between their all pairs of annotating gene ontology (GO) terms belonging to a particular ontological graph. The similarity of a GO term pair is determined by considering specific topological properties (shortest path length) of the GO graph and the average information content (IC) [52] of the disjunctive common ancestors () [45], [46] of the GO terms as proposed in [35]. Fuzzy PPI first relies on a fuzzy clustering of the GO graph where the selection of GO terms as cluster center is based on the level of association of that GO term in the GO graph. Then, the cluster centers are selected based on the proportion measure of GO terms. The proportion measure for any GO term is computed aswhere , represents the ascendant and descendant of term , and is the total number of GO terms in ontology . A higher value of the proportion measure () signifies higher coverage of ascendants and descendants associated with the specific node. Finally, the GO terms for which this proportion measure is above a predefined threshold are selected as cluster centers. In this work, the cluster centers are chosen based on the threshold values as suggested in [26], [35]. After selecting the cluster centers, the degree of membership of a GO term to each of the selected cluster centers is calculated using its respective shortest path lengths to the corresponding cluster centers. The membership of the GO term to a cluster decreases with an increase in its shortest path length to the cluster center. The membership function is defined aswhere is center and is the width of the membership function, and is the shortest path length from to . The difference in membership values between the GO pair and for each cluster center, is computed to find the weight parameter. The weight parameter is defined as This weight value determines how different two GO terms can be with respect to the cluster centers. Next, the shared information content (SIC) is computed using the average information content (IC) [52] of the of the GO term pair () for all three GO graphs. The SIC is defined aswhere represents the disjunctive common ancestors of GO-term and . The semantic similarity (SS) between the GO term pair and defined as The semantic similarity of protein pair () for each GO-type (CC, MF, and BP), is estimated by utilizing the maximum similarity of all possible GO pairs from the annotations of proteins and for each type of GO. The interaction affinity of protein pair () is defined as the average of CC, MF, and BP-based semantic similarity. This work uses the available ontological information to calculate the fuzzy interaction affinity score between the protein pairs of SARS-CoV2 and spreader human proteins (please see Fig. 3 ). Here, the SARS-COV's level-1 and level-2 spreader proteins are employed as the primary target for the proposed fuzzy PPI model for interaction affinity computation. A bipartite relation of GO pairs is primarily generated from each pair of proteins for each type of GO annotations (CC, MF, and BP) independently (Fig. 3A). To reduce the computational overhead and time, semantic similarity scores are previously computed between all GO pairs belonging to a particular GO type using equation (5) [35]. The semantic similarity is computed by exploring the topological properties of the GO subgraphs. For each type of GO subgraphs, a different set of cluster center nodes (GO terms) are identified based on proportion measure (equation (1)) that rely on the annotation score and GO relationship graph hierarchy. The GO semantic similarity is estimated with a distance-based measure between the target GO pair by exploring the membership score (equation (2), 3) and values (equation (4)) compared to respective cluster canters of each GO subgraphs (Fig. 3B). For each GO type, the max of all possible scores of the bipartite links in a particular GO subgraph is considered the final semantic score of that type of GO.

Fig. 3

Schematic diagram of Fuzzy PPI model. A) The fuzzy PPI model finds the interaction affinity between the SARS-CoV2 and Human proteins (L1 and L2 spreader of SARS-CoV) using ontological gene information. B) All GO pair-wise interaction affinities are assessed from three independent GO-relationship graphs CC, MF, and BP. The fuzzy interaction affinity of a protein pair is computed from all three pair-wise scores of all GO-pair affinities. C) Heatmap representation of Fuzzy PPI score. D) Network representation of Human and SARS-CoV2 proteins with 0.2 onward thresholds of Fuzzy PPI score at high specificity. Finally, high-quality interactions are extracted to retrieve the potential human prey for SARS-CoV2 at the 0.4 threshold.

A computational model for the selection of spreader nodes in Human-SARS-CoV PPIN by spreadability index. Red-coloured nodes represent SARS-CoV proteins, while blue-colored nodes are the selected spreader nodes in it. Deep green colored nodes represent level-1 human connected proteins with SARS-CoV proteins, while yellow-coloured nodes represent the selected human spreaders. Light green colored nodes represent level-2 human spreaders of SARS-CoV. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Schematic diagram of Fuzzy PPI model. A) The fuzzy PPI model finds the interaction affinity between the SARS-CoV2 and Human proteins (L1 and L2 spreader of SARS-CoV) using ontological gene information. B) All GO pair-wise interaction affinities are assessed from three independent GO-relationship graphs CC, MF, and BP. The fuzzy interaction affinity of a protein pair is computed from all three pair-wise scores of all GO-pair affinities. C) Heatmap representation of Fuzzy PPI score. D) Network representation of Human and SARS-CoV2 proteins with 0.2 onward thresholds of Fuzzy PPI score at high specificity. Finally, high-quality interactions are extracted to retrieve the potential human prey for SARS-CoV2 at the 0.4 threshold. Similarly, all three different scores are evaluated and averaged to find the interaction affinity for the annotated protein pair. Then, the fuzzy score of interaction affinity is computed by normalizing the interaction affinity using max–min normalization. Finally, with high specificity threshold (please see Fig. 6), high-quality interactions (78 interactions involving 37 human level-1 spreaders) are extracted for human-SARSCoV2.

Fig. 6

Specificity at different threshold (x-axis) of binding affinity obtained from Fuzzy PPI model for complete human proteome interaction network. At 0.2 onward threshold, it produces high specificity with respect to benchmark positive and negative interaction data. High-quality interactions are extracted at a 0.4 threshold with ∼99:9% specificity.

Dataset description

SARS-CoV-Human PPIN serves as a baseline for our model. The potential level-1 and level-2 human spreaders of SARS-CoV become the possible candidate set for selecting level-1 human spreaders of SARS-CoV2. Various datasets have been curated for this purpose which has been outlined below:

Human PPIN

The dataset [53], [54] consists of all possible interactions between human proteins experimentally documented in humans. Human proteins are represented as nodes, while edges represent the physical interactions between proteins. It is a collection of 21,557 nodes and includes 342,353 edges/interactions.

SARS-CoV PPIN

The dataset [30] consists of interactions between SARS-CoV proteins. It contains 7 unique proteins and the involvement of 17 interacting edges. Only the densely connected proteins are considered rather than the isolated ones since the former play a more active role in the transmission of infection than the latter.

SARS-CoV-human PPIN

The dataset [30] comprises 118 interactions between SARS-CoV and humans. It is used to fetch the level-1 human interactions of SARS-CoV.

SARS-CoV2 proteins

This data is collected from the pre-released dataset of available SARS-CoV2 protein from UniProtKB [33], [55], which includes 14 reviewed SARS-CoV2 proteins.

GO graph and protein-GO annotations

GO graph types (CC, MF, and BP) are collected from GO Consortium [34], [51]. In addition, the protein to GO-annotation map is retrieved from the UniProtKB database.

Potential COVID-19 FDA drugs

Six potential FDA drugs: Lopinavir [56], Ritonavir [57], Azithromycin [58], Remdesivir [59], [60], [61], Favipiravir [62], [63], and Darunavir [64] have been identified from the DrugBank [65] published white paper [37] which have been used for validation in our proposed model.

Results and discussion

Our developed computational model of nCoV-Human PPIN contains high-quality interactions (HQI) and proteins identified by Fuzzy affinity thresholding and spreadability index validated by the SIS model. The sources of input and the generated results always play a crucial role in any computational model, which is also true for our proposed model.

Spreader nodes selection in Human-SARS CoV interaction network using spreadability index

SARS-CoV-Human PPIN (up to level-2) is formed by the combination of SARS-CoV-Human and Human-Human PPIN datasets. SARS-CoV-Human dataset generates the direct level-1 human interactions of SARS-CoV, while the human–human PPIN dataset is used to fetch the corresponding level-2 human interactions. Potential spreader nodes are identified using the spreadability index validated by the SIS model [14]. The entire process of the detection of spreader nodes in SARS-CoV-Human PPIN is depicted in four steps in Fig. 2 (used only for description): 1) Spreader nodes (6 spreaders) in SARS-CoV PPIN are detected by spreadability index. 2) Corresponding level-1 human proteins of the spreader nodes in SARS-CoV PPIN are identified. 3) Spreader nodes (24 spreaders) in level-1 human proteins of the spreader nodes in SARS-CoV PPIN are detected. 4) The same process is repeated, and spreader nodes (9 spreaders) in level-2 human proteins of the spreader nodes in SARS-CoV PPIN are identified. The selected spreader nodes in SARS-COV-Human PPIN are highlighted in additional Table A1, Table A2, and Table A3. The network view of SARS-CoV-Human PPIN at each level and various selected thresholds of spreadability index are also available online (SARS-CoV human spreaders link: L-1, human spreaders at the high threshold of spreadability index link: L1 & L2:high, and human spreaders at the low threshold link: L1 & L2:low).

Fig. 2

Identification of the nCoV-Human proteins interactions using fuzzy PPI model

The GO information can be helpful to infer the binding affinity of any pair of interacting proteins using three different types of GO hierarchical relationship graphs (CC, MF, and BP) [34]. The fuzzy PPI model has been applied to find the interaction affinity between the SARS-CoV2 and Human proteins using GO-based information (please see Fig. 3 and section 2.2 for details). To identify the interactors of SARS-CoV2 on humans using the Fuzzy PPI model, a set of candidate proteins are selected, which are identified as the L1 and L2 spreader nodes of SARS-CoV using the SIS model (as depicted in Fig. 2). The fuzzy PPI model is constructed from the ontological relationship graphs by evaluating the affinity between all possible GO pairs annotated from any target protein pair. Finally, the fuzzy score of interaction affinity of protein pair is computed from these GO pair-wise interaction affinity into a range of [0,1]. We have used experimentally validated human protein interactions (physical only) from publicly available interaction databases, such as HIPPIE [66], STRING [67], BioGRID [68], DIP [69], HuRI [70] for positive data and Negatome 2.0 [71], Trabuco et al. [72] for negative data. The positive interactions are also filtered by removing the edges that are common in both positive and negative interaction sets. In each database, Gold standard data is curated by using the scoring scheme provided by the respective databases. The selection criteria are described in Table S6 in the supplementary document. With this benchmarking data set, the FuzzyPPI Model has been assessed with different fuzzy scoring cut-off values. The performance of this assessment is reported in Table S7 in the supplementary document. In any classification task, specificity signifies the ability to identify a positive sample correctly. In order to identify high-quality positive interactions, we used the specificity metric. With the increasing value of specificity, the number of false-positive (FP) interactions has shown a sharp fall as depicted in the following table. At threshold ≥0.2 and ≥0.4, the FP is 0.0048% and 0.0001% of total negative interactions respectively. Thus, the Specificity threshold is set at ≥0.4. The heatmap representation of fuzzy interaction affinities (with a score ≥of 0.2 for very high specificity ∼99%) is shown in additional Fig. A1 and Table A4. The high-quality interaction (HQI) is retrieved at threshold 0.4 (almost ∼99:98% Specificity), which results in a total of 78 interactions between SARS-CoV2 and humans (37 human level-1 spreaders). The interaction networks predicted from the Fuzzy-PPI model are shown in Fig. 4 .

Fig. 4

Network representation of HQIs (score ≥ 0.4) between SARS-CoV2 and human proteins. Blue and yellow spherical nodes represent the SARS-CoV2 and human proteins, respectively. The edge width reflects the fuzzy score of binding affinity. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Identification of human spreader proteins for nCoV

Human proteins present in the high-quality interactions of nCoV-Human PPIN fetched by applying fuzzy affinity threshold are considered level-1 spreaders. From these 37 level-1 spreaders, corresponding level-2 human interactions are obtained using the human–human PPIN dataset. Spreadability index is thus computed for these level-2 human proteins for the identification of level-2 human spreader nodes. The SIS model also verifies the selection. The selected spreader nodes in SARS-COV2-Human PPIN (2474 level-2 human spreaders under high threshold) are highlighted in additional Table A4, Table A5, and Table A6. In addition, the computational model of nCoV-Human PPIN under a high threshold has been highlighted online https://kzeumvafuq8hob5bzpsphq-on.drv.tw/www.40highthreshold_sarscov2.com/sarcov2_graph_40_percent_high_threshold.html. It highlights the human level-1 (marked in yellow) and level-2 spreader nodes (marked in green). The network view of SARS-CoV2-Human PPIN at each level and various selected thresholds are also available online (SARS-CoV2 Level-1 human spreaders, Level-1 & Level-2:high spreaders at the high threshold of spreadability index and Level-1 & Level-2:low human spreaders at a low threshold of spreadability index).

Validation using potential FDA drugs for COVID-19

After proper assessment of all potential drugs as mentioned in the DrugBank [65] white paper [37], six drugs: Lopinavir [56], Ritonavir [57], Azithromycin [58], Remdesivir [59], [60], [61], Favipiravir [62], [63] and Darunavir [64] are identified which are showing expected results to some extent in the clinical trials done for SARS-CoV2 vaccine. All approved human protein targets for each of the five approved drugs are fetched from the advanced search section [73] of the drug bank [65], [74]. When searched in our proposed model of nCoV-Human PPIN, these targets are found to play an active role in spreader nodes. This reveals that the selected spreader nodes are of biological importance in transmitting infection in a network that makes them the protein drug targets of the potential FDA drugs for COVID-19. The target protein hits in our nCoV-Human PPIN for each of the 7 potential FDA drugs are highlighted in Fig. 5 . It can be observed that 3 target proteins for Ritonavir, 2 target protein hits for each of Lopinavir, Darunavir, and Azithromycin, and 1 target protein hit for Remdesivir and Favipiravir. Out of these protein targets, ACE2 is the most important one since it is considered one of the crucial receptors of humans for nCoV to transmit infection deep inside the human cell [75], [76], [77]. Based on this validation, further research is conducted along with drug repurposing study, docking study, and COVID-19 symptoms-based analysis in our next research work [78] which helps us to identify a possible potential drug for COVID-19 named Fostamatinib [79], [80], [81]. Clinical studies involving Fostamatinib are also in progress [82], [83]. Though the research is at the initial level, yet it somehow supports our research findings to some extent.

Fig. 5

Validation of our developed computational model with respect to the target proteins of the FDA accepted drugs for COVID-19 treatment. Yellow- and green-colored nodes denote level-1 and level-2 human spreaders of nCoV, which acts as the drug-protein targets. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Conclusion

In any host-pathogen interaction network, the identification of spreader nodes is crucial for disease prognosis. However, not every protein in an interaction network has an intense disease-spreading capability. In this work, we have used the SARS-CoV-Human PPIN network and the spreader nodes at both level-1 and level-2 using the SIS model. These spreader nodes are considered for computing the protein interaction affinity score to unmask the level-1 human spreaders of nCoV. In addition, GO annotations have also been considered along with PPIN properties to make this model more effective and significant. With the gradual progress of the work, it has been observed that the selected human spreader nodes, identified by our proposed model, emerge as the potential protein targets of the FDA-approved drugs for COVID-19. The primary hypotheses of the work may be listed as follows: 1) There is a genetic overlap of ∼89% [84] between SARS-CoV and SARS-CoV2, which also lead to a significant overlap in spreader proteins between human-SARS-COV and human-SARSCOV2 protein-interaction networks. 2) Fuzzy PPI approach can assess protein interaction affinities at very high specificity with respect to benchmark datasets, as shown in Fig. 6 . High specificity signifies a meager false-positive rate at a given threshold. Thus, at a 0.4 threshold (∼99:9% specificity), the proposed model evaluates high-quality positive interactions in Human-nCoV PPIN. Specificity at different threshold (x-axis) of binding affinity obtained from Fuzzy PPI model for complete human proteome interaction network. At 0.2 onward threshold, it produces high specificity with respect to benchmark positive and negative interaction data. High-quality interactions are extracted at a 0.4 threshold with ∼99:9% specificity. Finally, we propose that the developed computational model effectively identifies Human-nCoV PPIs with high specificity. The nCoV-Human interactions are inferred from another pandemic initiator SARS-CoV, which is highly genetically similar to nCoV. We also recognize the spreadability index of the human spreader proteins (up to level-2), validated through the SIS model. Due to high network density in human interaction networks, the number of proteins increases with the transition from one level to another. So, our proposed model can also identify human spreader proteins in level-2 by using the spreadability index validated by the SIS model. Our proposed method has identified the ACE2 and TMPRSS2 as an interactor of SARS-CoV2 proteins, which is essential for entry into the human host. SARS-CoV2 interacts with the SARS-CoV entry receptor ACE2 as SARS-CoV2 preserves those amino acid residues of SARS-CoV that are essential for ACE2 binding [85]. However, the binding strength of SARS-CoV2 with ACE2 is 10 to 20 times more than the SARS-CoV2-ACE2 attachment [86]. This is because several changes occur in the receptor-binding domains (RBDs) of SARS-CoV2 spike protein [87]. In addition, the cellular serine protease TMPRSS2 primes SARS-CoV2 for host entry, and a Serine protease inhibitor blocks SARS-CoV2 infection of lung cells [85], [88]. Thus, TMPRSS2 activity is essential for viral spread and pathogenesis in the infected host [85], [89]. In a recent study [90], Gordon et al. have identified 332 high-confidence SARS-CoV2-human protein–protein interactions where they have worked on the sequence analysis of SARS-CoV2 isolates. They cloned, tagged, and expressed 26 of the 29 SARS-CoV2 proteins in human cells and identified the human proteins that were physically associated with each using affinity-purification mass spectrometry (AP-MS). However, while comparing their seminal work with ours, we found that the SARS-CoV2 protein sequences used by Gordon et al. do not map directly with the available UniProt accession ids. In our case, we have worked only on the UniProt listed SARS-CoV2 proteins and applied a mathematical model of binding affinity assessment on a subset of UniProt listed reviewed Human proteins. Therefore, direct comparison and validation could not be possible with respect to Gordon et al., primarily because of the unavailability of direct mapping of SARS-CoV2 proteins into corresponding UniProt accession ids. However, an attempt has been made to map UniProt ids of SARS-CoV2 proteins of Gordon et al., from COVID-19 UniProtKB reference data [55] (please see Table S8 in the supplementary document). It is clear from Table S8 in the supplementary document that though UniProt ids are available for some of them but GO annotations for most of them are missing. Another interesting observation is that the entries marked in green have been also taken into consideration in this research work as well. It should be noted here that the current work depends heavily on the underlying GO Network of the host-pathogen PPIN. As evident from Table S8, GO annotations are often missing in the new protein list. Therefore we are working on a new strategy for the computational prediction of GO annotations for the set of proteins [16], [17], [18], [19] in the Gordon’s list as well new mutant variants. One of the key highlights of our study may be underlined by the fact that the target proteins of the potential FDA drugs for COVID-19 overlap with the spreader nodes of the proposed nCoV-Human protein interaction network. Target proteins of six potential FDA drugs: Lopinavir [56], Ritonavir [57], Azithromycin [58], Remdesivir [59], [60], [61], Favipiravir [62], [63], and Darunavir [64] for COVID-19 as mentioned in the DrugBank white paper [37] overlap with the spreader nodes of the proposed in silico nCoV-Human protein interaction model (see Fig. 5). Though clinical trials for the COVID-19 vaccine are on their way to date, three out of the six repurposed drugs, i.e., Remdesivir [91] and Favipiravir [92] are found to be the most promising as well as effective ones. Our proposed model successfully identified their protein targets R1AB SARS2, TLR9, ACE2, CYP3A4, and ABCB1 as spreader nodes. This assessment reveals the fact that these spreader nodes indeed have biological relevance relative to disease propagation. It also motivates us to further do a drug repurposing study on the generated SARS-CoV2-human PPIN in our subsequent research work [78], which highlights that the drug Fostamatinib/R406 might be one of the potential drugs to be used for SARS-CoV2.

CRediT authorship contribution statement

Sovan Saha: Conceptualization, Data curation, Methodology, Writing – original draft, Software. Anup Kumar Halder: Conceptualization, Data curation, Methodology, Writing – original draft, Software. Soumyendu Sekhar Bandyopadhyay: Data curation, Writing – original draft, Visualization. Piyali Chatterjee: Supervision, Investigation, Formal analysis, Writing – review & editing. Mita Nasipuri: Supervision, Project administration, Investigation, Formal analysis, Writing – review & editing. Subhadip Basu: Supervision, Project administration, Investigation, Formal analysis, Data curation, Methodology, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

55 in total

1. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks.

Authors: Yu Tang; Min Li; Jianxin Wang; Yi Pan; Fang-Xiang Wu
Journal: Biosystems Date: 2014-11-15 Impact factor: 1.973

2. Negative protein-protein interaction datasets derived from large-scale two-hybrid experiments.

Authors: Leonardo G Trabuco; Matthew J Betts; Robert B Russell
Journal: Methods Date: 2012-08-04 Impact factor: 3.608

3. Coronavirus puts drug repurposing on the fast track.

Authors: Charlotte Harrison
Journal: Nat Biotechnol Date: 2020-04 Impact factor: 54.908

4. A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors: Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal: Nature Date: 2020-02-03 Impact factor: 69.504

5. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Authors: Chaolin Huang; Yeming Wang; Xingwang Li; Lili Ren; Jianping Zhao; Yi Hu; Li Zhang; Guohui Fan; Jiuyang Xu; Xiaoying Gu; Zhenshun Cheng; Ting Yu; Jiaan Xia; Yuan Wei; Wenjuan Wu; Xuelei Xie; Wen Yin; Hui Li; Min Liu; Yan Xiao; Hong Gao; Li Guo; Jungang Xie; Guangfa Wang; Rongmeng Jiang; Zhancheng Gao; Qi Jin; Jianwei Wang; Bin Cao
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

6. A novel coronavirus outbreak of global health concern.

Authors: Chen Wang; Peter W Horby; Frederick G Hayden; George F Gao
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

7. Method for Identifying Essential Proteins by Key Features of Proteins in a Novel Protein-Domain Network.

Authors: Xin He; Linai Kuang; Zhiping Chen; Yihong Tan; Lei Wang
Journal: Front Genet Date: 2021-06-29 Impact factor: 4.599

8. Structural basis of receptor recognition by SARS-CoV-2.

Authors: Jian Shang; Gang Ye; Ke Shi; Yushun Wan; Chuming Luo; Hideki Aihara; Qibin Geng; Ashley Auerbach; Fang Li
Journal: Nature Date: 2020-03-30 Impact factor: 49.962

9. A Trial of Lopinavir-Ritonavir in Adults Hospitalized with Severe Covid-19.

Authors: Bin Cao; Yeming Wang; Danning Wen; Wen Liu; Jingli Wang; Guohui Fan; Lianguo Ruan; Bin Song; Yanping Cai; Ming Wei; Xingwang Li; Jiaan Xia; Nanshan Chen; Jie Xiang; Ting Yu; Tao Bai; Xuelei Xie; Li Zhang; Caihong Li; Ye Yuan; Hua Chen; Huadong Li; Hanping Huang; Shengjing Tu; Fengyun Gong; Ying Liu; Yuan Wei; Chongya Dong; Fei Zhou; Xiaoying Gu; Jiuyang Xu; Zhibo Liu; Yi Zhang; Hui Li; Lianhan Shang; Ke Wang; Kunxia Li; Xia Zhou; Xuan Dong; Zhaohui Qu; Sixia Lu; Xujuan Hu; Shunan Ruan; Shanshan Luo; Jing Wu; Lu Peng; Fang Cheng; Lihong Pan; Jun Zou; Chunmin Jia; Juan Wang; Xia Liu; Shuzhen Wang; Xudong Wu; Qin Ge; Jing He; Haiyan Zhan; Fang Qiu; Li Guo; Chaolin Huang; Thomas Jaki; Frederick G Hayden; Peter W Horby; Dingyu Zhang; Chen Wang
Journal: N Engl J Med Date: 2020-03-18 Impact factor: 91.245

1 in total

1. Drug repurposing for COVID-19 using computational screening: Is Fostamatinib/R406 a potential candidate?

Authors: Sovan Saha; Anup Kumar Halder; Soumyendu Sekhar Bandyopadhyay; Piyali Chatterjee; Mita Nasipuri; Debdas Bose; Subhadip Basu
Journal: Methods Date: 2021-08-27 Impact factor: 4.647

1 in total