Literature DB >> 34902608

VDA-RWLRLS: An anti-SARS-CoV-2 drug prioritizing framework combining an unbalanced bi-random walk and Laplacian regularized least squares.

Ling Shen¹, Fuxing Liu¹, Li Huang², Guangyi Liu¹, Liqian Zhou³, Lihong Peng⁴.

Abstract

BACKGROUND: A new coronavirus disease named COVID-19, caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), is rapidly spreading worldwide. However, there is currently no effective drug to fight COVID-19.
METHODS: In this study, we developed a Virus-Drug Association (VDA) identification framework (VDA-RWLRLS) combining unbalanced bi-Random Walk, Laplacian Regularized Least Squares, molecular docking, and molecular dynamics simulation to find clues for the treatment of COVID-19. First, virus similarity and drug similarity are computed based on genomic sequences, chemical structures, and Gaussian association profiles. Second, an unbalanced bi-random walk is implemented on the virus network and the drug network, respectively. Third, the results of the random walks are taken as the input of Laplacian regularized least squares to compute the association score for each virus-drug pair. Fourth, the final associations are characterized by integrating the predictions from the virus network and the drug network. Finally, molecular docking and molecular dynamics simulation are implemented to measure the potential of screened anti-COVID-19 drugs and further validate the predicted results.
RESULTS: In comparison with six state-of-the-art association prediction models (NGRHMDA, SMiR-NBI, LRLSHMDA, VDA-KATZ, VDA-RWR, and VDA-BiRW), VDA-RWLRLS demonstrates superior VDA prediction performance. It obtains the best AUCs of 0.885 8, 0.835 5, and 0.862 5 on the three VDA datasets. Molecular docking and dynamics simulations demonstrated that remdesivir and ribavirin may be potential anti-COVID-19 drugs.
CONCLUSIONS: Integrating unbalanced bi-random walks, Laplacian regularized least squares, molecular docking, and molecular dynamics simulation, this work initially screened a few anti-SARS-CoV-2 drugs and may contribute to preventing COVID-19 transmission.

Entities: Chemical

Keywords: Antiviral drug; Laplacian regularized least squares; Molecular docking; Molecular dynamics simulation; SARS-CoV-2; Unbalanced bi-random walk; Virus-drug association

Year: 2021 PMID： 34902608 PMCID： PMC8664497 DOI： 10.1016/j.compbiomed.2021.105119

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Introduction

Background

In December 2019, a new coronavirus disease named COVID-19 by the World Health Organization broke out in Wuhan, China. COVID-19 is caused by an unknown coronavirus called severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). As of May 13, 2021, COVID-19 had a cumulative total of 159,949,065 confirmed cases with over 3,322,439 deaths [1]. With the continuing threat of SARS-CoV-2 to global health, it is urgent to develop effective prevention and treatment strategies for SARS-CoV-2 transmission [[2], [3], [4], [5], [6], [7]]. Although pharmaceutical companies have invested enormous money and time in drug research and development over the past two decades, the number of new chemical entity drugs approved by the FDA is still relatively limited every year. Drug repositioning, applied to reposition an existing active pharmaceutical ingredient for a new indication, helps to reduce the attrition experienced in the process of new drug discovery [8,9]. Therefore, drug repositioning has been a research hotspot in drug discovery. It is unrealistic to find therapeutic options for COVID-19 patients by experimental methods under such an urgent situation. Subsequently, systematic identification of anti-SARS-CoV-2 drugs by repositioning FDA-approved drugs offers an effective way to find possible pharmaceutical ingredients for COVID-19 [10,11]. However, due to a lack of knowledge of complete drug-target interactions, designing promising and affordable methods for the effective treatment of COVID-19 is challenging. SARS-COV-2 is a new single-stranded RNA virus [12] and has high homology with severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) [13,14]. In addition, the spike (S) protein and host angiotensin-converting enzyme 2 (ACE2) have been recognized as two target proteins of COVID-19 in clinical trials [15]. The above knowledge provides clues for repositioning available drugs for COVID-19 [[15], [16], [17]].

Related work

Initial screening of potential antifungal drugs for COVID-19

Potential drugs against COVID-19 can be roughly divided into three main categories: antifungal drugs, antibacterial drugs, and antiviral drugs [18]. Roudbary et al. [19] and salehi et al. [20] observed that patients with COVID-19 infection are still extremely susceptible to fungal or bacterial infections. Chen et al. [21] found fungals related to co-infections in COVID-19, such as Aspergillus spp., Candida albicans, Candida dubliniensis, Candida glabrata, Candida krusei, Candida tropicalis, and Candida parapsilosis sensu stricto. Roudbary et al. [19] reported that fungals including aspergillosis, candidemia, Histoplasma spp., Rhizopus spp., Mucor spp., Cryptococcus spp. have been increasingly associated with COVID-19 co-infections caused by opportunistic fungal diseases. Therefore, new antifungal drug screening is currently urgently needed among COVID-19 patients [22,23]. Song et al. [24] provided a clinical diagram to help clinicians and laboratory experts manage aspergillosis, candidiasis, cryptococcosis, and mucormycosis in COVID-19 patients. Mohamed et al. [25] revealed that itraconazole, an antifungal drug, is one of the best-documented multi-target drugs.

Initial screening of potential antibacterial drugs for COVID-19

A bacterium plays an important role in various environments [26]. In COVID-19, bacterial co-pathogens are observed to be co-infectious and drastically increase morbidity and mortality rates [27]. Rawson et al. [28] reported that a few COVID-19 patients have been co-infected by bacteria, such as Acinetobacter baumannii, Escherichia coli, Haemophilus influenzae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus, and Streptococcus pneumoniae. Biosurfactants generated by bacterial species have wide antimicrobial activity, biocompatibility, biodegradability, low toxicity and better interfacial activity than traditionally produced chemical surfactants [26,29]. Hassan Mahmoudi [30] obtained blood cultures and endotracheal aspirates from COVID-19 patients based on a cross-sectional study and confirmed the bacterial isolates. They identified bacteria isolated through blood culture and endotracheal aspirate, for example, Enterobacter species 5, Escherichia coli 7, Klebsiella species 11, methicillin-resistant Staphylococcus aureus 6, methicillin-sensitive Staphylococcus aureus 9, Pseudomonas aeruginosa 4, and Streptococcus pneumoniae 1. Li et al. [31] revealed that secondary bacterial infection is one of the main complications in COVID-19 patients with high mortality. They found that Gram-negative bacteria (especially A. baumannii and K. pneumoniae) are the main bacteria. Thus, precise application of antibacterial drugs constitutes a promising alternative to COVID-19 patients with secondary bacterial infection [29,31,32]. Several works are dedicated to finding antimicrobial compounds produced by bacterial species to prevent the continuing transmission of COVID-19. Narendrakumar et al. [33] presented that doxycycline can inhibit bacterial protein synthesis and shows antibacterial activity in COVID-19 [33]. Pan et al. [34] reported that Shuanghuanglian, an antiviral and antibacterial compound, has been applied to COVID-19 treatment in China.

Initial screening of potential antiviral drugs for COVID-19

To accelerate drug discovery of COVID-19, multiple computational models were presented to assist in finding antiviral drugs by drug repositioning. Currently, computational drug-repurposing methods for initially screening anti-COVID-19 drugs can be roughly classified into structure-based approaches, machine learning-based approaches, and network-based approaches. Structure-based approaches [[35], [36], [37], [38]] aim to understand the binding mechanism of compounds with the targets of COVID-19 mainly by molecular docking, Molecular Dynamic Simulations (MDS) and binding energy computation. Machine learning-based methods mainly use machine learning models, especially deep learning models, to fight COVID-19 effects. BenevolentAI [39] extracted numerous structured medical information from recent reports by machine learning and predicted that baricitinib may be therapeutic strategy for COVID-19. Ge et al. [5] developed a data-driven drug repurposing framework integrating machine learning models and statistical analysis approaches to find possible drugs against SARS-CoV-2. Ray et al. [40] used a variational graph autoencoder to fight COVID-19. Ke et al. [41] developed a deep learning method to search available compounds effective against coronaviruses. Beck et al. [42] proposed a previously trained deep learning model called molecule transformer-drug-target interactions to find commercially available antiviral drugs against COVID-19. Although machine learning researchers have been very active in their efforts to prevent COVID-19 infections, few studies have concerned drug repositioning. More importantly, only few studies have been accepted for publication or are available online in a journal although their applications against COVID-19 have become publicly available [43]. Network-based methods computed biological similarity between drugs, viruses, targets and proposed network propagation algorithms to find therapeutic strategies for COVID-19. Gysi et al. [44] developed a graph neural network-based method and identified 81 potential repositioning candidates associated with SARS-CoV-2. Peng et al. [45] constructed a single-stranded RNA Virus-Drug Association (VDA) dataset and developed a bipartite local model to find potential treatments for COVID-19. Zhou et al. [46] designed a KATZ measurement for predicting therapeutic agents of COVID-19. Following by the above methods, Peng et al. [47] constructed three VDA datasets and developed a random walk with restart to discover potential antiviral drugs. The above methods effectively combined network-based drug repositioning, molecular docking, and document retrieval and are state-of-the-art virus-drug association prediction methods.

Molecular docking and dynamics simulation

To design novel inhibitors for various diseases, computational methods such molecular docking and MDS have been broadly carried out [48,49]. Molecular docking is a powerful technique widely applied to model the interactions between ligands and target proteins at the atomic level [50]. We can characterize the behavior of ligands binding to target proteins and elucidate fundamental biochemical processes through molecular docking. Molecular docking contains two basic steps: prediction of the ligand conformation and its position within binding sites and computation of the ligand-protein binding affinity. A higher binding energy shows more stable ligand-protein binding capability. Similarly, the binding affinity between chemical agents and the S protein/human ACE2 can be computed by molecular docking [51]. For example, Thillainayagam et al. [52] revealed the anti-malarial feature of quinolinyl chalcone derivatives by molecular docking, three-dimensional QSAR, and comparative molecular field analysis. MDS has been used to confirm ligand-protein stable interactions [53]. For example, by molecular docking and MDS, Thillainayagam et al. [48] studied the inhibitory action of epoxyazadiradione on malaria. Ragunathan [54] et al. reported that FtsA is a cidal target of Staphylococcus aureus. Ragunathan et al. [55] found that quercetin may be a stable inhibitor of Murb. More importantly, Basu et al. [53] identified a new cyclohexanone compound, ZINC07333416, that has crucial associations with most active-site residues of the SARS-CoV-2 main protease by molecular docking and confirmed these associations through MDS.

Study contributions

In this study, we developed a VDA prediction model (VDA-RWLRLS) based on biological information of viruses and drugs, unbalanced bi-Random Walks, Laplacian Regularized Least Squares (LRLS), molecular docking, and MDS. This study has four main contributions: The unbalanced bi-random walk algorithm is designed to globally exploit the weighted circular bigraphs in the virus network and the drug network. LRLS is presented to iteratively score each virus-drug pair. Viral sequences, drug chemical structures, and VDA information are utilized to integrate the biological features of the viruses and drugs. Molecular docking and MDS are applied to validate the inferred anti-SARS-CoV-2 drugs.

Materials and methods

Experimental data

Virus- association

In this study, we use three VDA datasets provided by Peng et al. [47]. Dataset 1 was provided by Zhou et al. [46] and contains 96 confirmed VDAs between 12 human RNA viruses, including SARS-CoV-2, and 78 small molecules. Dataset 2 was obtained by manually retrieving recent documents based on the text mining technique and contains 770 confirmed VDAs between 69 viruses and 128 small molecules. Dataset 3 was constructed by preprocessing VDA data from the Drug Virus.info database [56] and consists of 407 known VDAs from 34 viruses and 203 chemical agents. The details are shown in Table 1 .

Table 1

The description of virus-drug association datasets.

Dataset	Virus	Drugs	VDAs
Dataset 1	12	78	96
Dataset 2	69	128	770
Dataset 3	34	203	407

The description of virus-drug association datasets. A VDA matrix is represented as ( (n and m are the number of viruses and drugs, respectively), where x = 1 if the i-th virus v associates with the j-th drug d and x = 0 otherwise. The adjacency matrix of a VDA network is represented as ( with the element y by Eq. (1):

Drug chemical structure similarity

Based on the assumption that two drugs sharing more chemical substructures are more similar [57], drug chemical structure similarity can be calculated. Chemical substructures were first downloaded from the DrugBank database [58]. An open-source cheminformatics software, RDKit [59], is then applied to compute extended connectivity fingerprints [60] of drugs based on their chemical substructures. Finally the drug chemical structure similarity matrix was calculated based on the extended connectivity fingerprints.

Virus sequence similarity

The complete genomic sequences of viruses were obtained from the NCBI database [61], and the virus sequence similarity matrix was computed based on MAFFT [62], a multiple sequence alignment software.

Methods

Problem description

A heterogeneous virus-drug network is comprised of a virus similarity network, a drug similarity network and VDA data. Let (, ( and ( denote the affinity matrices of the virus similarity network, drug similarity network, and VDA network, respectively. The virus similarity network is constructed by using viruses as vertices, virus-virus similarities as edges, and similarity between two viruses as the weight of their linked edge. Similarly, a drug similarity network is built. The VDA network is the constructed VDA matrix. Our objective is to find missing VDAs on the heterogeneous network by reconstructing an association matrix (.

Overview of VDA-RWLRLS

In this study, we developed a VDA prediction framework, VDA-RWLRLS, based on various biological information, unbalanced bi-random walk, LRLS, molecular docking, and MDS. Fig. 1 shows the flowchart of VDA-RWLRLS. As shown in Fig. 1, the VDA-RWLRLS framework contains six main steps. (1) Similarity computation. Viral similarities are fused based on genomic sequences and Gaussian association profiles. Drug similarities are integrated based on chemical structures and Gaussian association profiles. (2) Label propagation. An unbalanced bi-random walk is designed to propagate association labels based on the left random walk on the virus network and the right random walk on the drug network. (3) Virus-drug association probability computation. LRLS is applied to compute association probabilities for unknown virus-drug pairs based on the results obtained from unbalanced bi-random walks. The prediction scores from the virus network and the drug network are averaged as the results at the current iteration. (4) Iteratively updating. Steps (2) and (3) are iteratively updated t times and the final predicted association scores ( ) are computed. (5) Molecular docking. Molecular docking was employed to measure the binding abilities between the predicted top anti-SARS-CoV-2 drugs and the crystal structure of the S protein-binding domain bound to human ACE2. (6) MDS. MDS was used to evaluate the interaction-dynamics of the predicted possible anti-COVID-19 drug-target proteins of the COVID-19 complex.

Fig. 1

The flowchart of VDA prediction based on an unbalanced bi-random walk, LRLS, molecular docking, and MDS (VDA-RWLRLS).

Gaussian association profile kernel similarity

Considering that any two viruses associated with more common drugs could tend to share higher similarity, we utilize Gaussian Association Profile Kernels (GAPK) [63] to compute virus and drug GAPK similarity matrices. The association profile of a virus v , denoted as (v ) (the i-th row of ), describes the relationship between virus v and all observed drugs. For two viruses v and v , their GAPK similarity is computed by Eq. (2).where is the bandwidth parameter. Similarly, drug GAPK similarity can be defined by Eq. (3).where is the bandwidth parameter. (d ) is the j-th column of and represents the correlation between drug d and all observed viruses. Sequence similarity and GAPK similarity are integrated to measure virus similarity by Eq. (4). Similarly, the fused drug similarity can be calculated by Eq. (5).

Balanced random walk

Suppose that a circular bigraph is represented as a subgraph comprised of a virus path and a drug path with two ends connected by two VDAs and . The circular bigraph denotes a vicinity relationship between two VDAs and by generalizing the similarity between v 1 and v , and one between d 1 and d . The smallest circular bigraph can be represented based on the “guilt-by-association” principle: The triangle with one virus and two drugs follows the assumption that similar drugs may be applied to the same disease. The rectangle with two viruses and two drugs follows the assumption that virus v tends to associate with drug d when virus v associates with drug d , v is similar to v and d is similar to d . The triangle with two viruses and one drug follows the assumption that diseases caused by similar viruses may be treated by the same drug. A Balanced Bi-Random Walk method (VDA-BiRW) is designed to identify potential VDAs by investigating circular bigraphs with high frequency in a VDA network. By multiplying on the left and on the right, VDA-BiRW iteratively extends the virus path and the drug path to identify the circular bigraphs by Eq. (6). In Eq. (6), both and are normalized to make the sum of all elements 1. Initially, 0 is equal to . The known VDA matrix is integrated to balance the bi-random walk and the prior knowledge. Parameter has two functions: (I) Decreasing the importance of a circular bigraph when the path is getting longer. (II) Balancing potential VDAs and known VDAs. Successive matrix multiplication is used to implement mimic jumps on the virus network, the drug network, and the VDA network. In the first walking, the -th element in the matrix denotes the number of circular bigraphs obtained by linking the i-th target virus with the path length of 1 to the j-th candidate drug with the path length of 1. When α = 1, after t steps of successive multiplication , we can find circular bigraphs with a path length of t.

Unbalanced Bi-random walk

Theoretically, the above balanced random walk model can converge to a unique solution. However, circular bigraphs with small path lengths are more informative for VDA identification. In particular, circular bigraphs with long path lengths could result in false positives. More importantly, the virus similarity network and the drug similarity network contain different topological structures, and thus the optimal walking step size could be different in the process of random walk on the two networks. Therefore, we develop an unbalanced bi-random walk-based VDA prediction method by introducing two parameters as the maximal iteration numbers on the left and right random walks on the two networks, respectively: where Eqs. (7), (8) denote the left random walk with the step size of l on the virus network, and the right random walk with the step size of r on the drug network, respectively. At each step, virus similarity is integrated to the left random walk by multiplying on the left of Eq. (7). Similarly, drug similarity is integrated by Eq. (8).

Laplacian Regularized Least Squares

In the unbalanced bi-random walk algorithm, the jump condition depends on the computed similarity matrices and known VDAs. For a given node in a VDA network, if there exist two nodes that have the same similarity with the given node, the two nodes will equally contribute to the direction of the jump. However, the node with relatively smaller similarities with the other nodes may have more contribution. Therefore, LRLS [64] is introduced to solve the above problem. First, the virus Laplacian matrix and drug Laplacian matrix are normalized to measure the jump for each node by Eqs. (9), (10). where and are the diagonal matrices of viruses and drugs whose elements (i, i) and (j, j) represent the summation of the i-th row of and the j-th row of , respectively. Second, the cost function in the virus space and the drug space can be defined to describe the minimum optimization problem based on Laplacian matrices and by Eqs. (11) and (12), respectively: where ‖ ⋅‖ represents the Frobenius norm, (⋅) denotes the transpose, and η and η are trade-off parameters. The above two models (11) and (12) can be solved by Eqs. (13), (14)), respectively: To comprehensively consider the effect of unbalanced random walk on the prediction performance, we replace with association scores obtained from unbalanced random walks. Suppose that Eqs. (15) and (16) be defined as follows: At the t-th walking, the corresponding models integrating unbalanced random walk and LRLS can be defined by Eqs. (17), (18), respectively. Finally, VDA-RWLRLS computes the association probability for each virus-drug pair by integrating the interaction scores from the virus network and the drug network by Eq. (19): Molecular docking has been widely applied to measure the intermolecular binding affinity between one ligand-receptor pair. We implemented molecular docking to evaluate the binding abilities between the predicted anti-SARS-CoV-2 drugs and the S protein/human ACE2. Drug chemical structures can be obtained from the DrugBank database [58] and are represented based on the PDB format files. AutoDockTool is used to convert the PDB files into the pdbqt format files required by AutoDock 4. The structure of the S protein-binding domain bound to human ACE2 (PDB ID: 6M0J) was downloaded from the RCSB Protein Data Bank [65]. The predicted anti-SARS-CoV-2 compounds are used as ligands, and the binding domain of the S protein bound to human ACE2 is used as a receptor to implement molecular docking. We used PyMOL [66] to remove solvent and organic compounds. Receptor atoms are set as the AD4 type and Gasteiger charges are computed before docking. Binding pockets are defined using AutoGrid 4. The grid size is set as 126 × 126 × 126 with a spacing of , and the grid center is placed at the area of the S protein-binding domain bounding with ACE2. AutoDock [67] is finally applied to molecular docking where the Lamarckian genetic approach with default parameters [68] is used as the search algorithm. Basu et al. [53] employed MDS and evaluated the interaction dynamics of one ligand-protein complex for COVID-19. Inspired by MDS proposed by Basu et al. [53], we carried out MDS to analyze our predicted anti-SARS-CoV-2 drugs after molecular docking. We employ GROMACS 2021-2 for MDS with a Charmm36 force field [69]. First, we obtained topological files of ligands through CGENFF [63]. Second, we generated topological files of receptors by ignoring H atoms and using the tip 3-point for the water model based on gromacs. Third, the ligand-protein complex is placed in a dodecahedral box filled with simple point charge water, distance between the solvent and the box is set to 1 nm, and ions including Na or Cl are added to the solvation system to balance the charge. Fourth, the energy is minimized to the initial structure before simulation by the steepest descent minimization method. In the process, the iteration stop condition is set as the maximum force with less than 1000 kJ/mol nm−1 and the maximum number of iterations is set as 50 000. Five, the solvents and ions around the complex are balanced. The process is divided into two stages: NVT (constant number of particles, constant volume, and constant temperature) and NPT (constant number of particles, constant pressure P, and constant temperature). In NVT stage, we set the target temperature to 300 K and perform 100 ps simulation where unit time is set to 2 fs and the maximum number of iterations is set to 50 000. In the NPT stage, we use the same 100 ps balance, and the reference pressure is set to 1 bar. After balancing the density of the system, we conduct an MDS of 50 ns.

Results

Experimental setting

We perform 5-fold Cross Validations (CVs) to investigate the performance of our proposed VDA-RWLRLS method. 80% of the VDAs of Y were randomly extracted as the training set, and the remaining VDAs as testing set. Sensitivity, specificity, accuracy, F1 score, and AUC were used as evaluation metrics. Sensitivity and specificity denote the ability of a VDA identification model to correctly predict positive VDAs and negative VDAs, respectively. Accuracy denotes the ratio of the correctly predicted positive and negative VDAs to all known positive VDAs and screened negative VDAs. The F1 score is a harmonic mean between precision and recall. Precision is the ratio of the correctly predicted positive VDAs to all predicted positive VDAs. Recall is the same for sensitivity. Let TP, FP, TN, and FN indicate true positive, false positive, true negative, and false negative, respectively. The definitions of sensitivity, specificity, accuracy, and F1 score are by Eqs. (20), (21), (22), (23), (24), (25): where AUC is widely applied to evaluate the performance of a binary classifier. It denotes the area under the Receiver Operating Characteristic (ROC) curve [70,71]. The ROC curve is a plot that characterizes the trade-off between sensitivity and (1-specificity) or between the true positive rate and false positive rate across a few cut-off points. Higher sensitivity, specificity, accuracy, F1 score and AUC indicate better performance for VDA prediction models. In addition, the AUC is a more important measurement than the other four evaluation metrics. The experiments are repeated 20 times, and the final performance is achieved by averaging the results from 20 experiments.

Parameter settings

There are seven parameters used in VDA-RWLRLS: α, l, r, λ , λ , and γ /γ . The parameter α is used to control the importance between the predicted VDAs and known VDAs during random walks. l and r control the iteration steps on the left and right random walks, respectively. λ and λ measure the importance of Gaussian association profile similarity and virus sequence similarity/drug chemical structure similarity. γ and γ are bandwidth parameters. η and η are set as 0.01. We use grid search to obtain the optimal parameter combination. Table 2 shows the optimal parameter settings where VDA-RWLRLS computes the best performance.

Table 2

The parameter settings of VDA-RWLRLS.

Dataset	α	l	r	λ_v	λ_d	γ_d/γ_v
dataset1	0.3	11	11	0.1	0.1	2.5
dataset2	0.001	31	1	0.5	0.5	2.5
dataset3	0.001	11	1	0.1	0.1	2.5

The parameter settings of VDA-RWLRLS.

Comparison with six state-of-the-art methods

The proposed VDA-RWLRLS method is compared with six VDA prediction methods (NGRHMDA [72], SMiR-NBI [73], LRLSHMDA [64], VDA-KATZ [46], VDA-RWR [47], and VDA-BiRW) on three VDA datasets. NGRHMDA is a recommendation-based microbe-disease association prediction approach. SMiR-NBI is a network inference-based method applied to identify biomarkers of cancers. LRLSHMDA is an LRLS classifier with Gaussian interaction profile kernel similarity. The three methods computed good accuracy in the corresponding application area. VDA-KATZ and VDA-RWR are the two newest VDA prediction methods and obtained better performance. VDA-BiRW is a balanced random walk-based VDA identification model. Table 3 illustrates the comparison results of seven VDA prediction models on three VDA datasets. The experimental results show that VDA-RWLRLS obtains better AUCs on the three datasets. We predicted the top 10 drugs associated SARS-CoV-2 and found that two small molecules, remdesivir and ribavirin, came together in three datasets, and three small molecules, nitazoxanide, favipiravir, and niclosamide, came together in two datasets. Molecular docking and MDS were implemented between the five drugs and the crystal structure of the S protein-binding domain bound with ACE2. The results suggest that remdesivir, ribavirin, nitazoxanide, and niclosamide have higher binding abilities with the crystal structure.

Table 3

The performance of seven VDA prediction methods on three datasets.

Datasets	Methods	Sensitivity	Specificity	F1 score	Accuracy	AUC
Dataset 1	NGRHMDA	0.578 3	0.556 7	0.061 5	0.557 2	0.645 9
	SMiR-NBI	0.833 1	0.193 6	0.038 5	0.207 9	0.572 3
	LRLSHMDA	0.803 4	0.581 3	0.111 9	0.586 3	0.840 3
	VDA-KATZ	0.697 6	0.668 4	0.151 7	0.669 1	0.880 3
	VDA-RWR	0.482 4	0.783 1	0.115 3	0.827 8	0.858 2
	VDA-BiRW	0.832 3	0.636 8	0.133 2	0.641 1	0.876 5
	VDA-RWLRLS	0.562 6	0.838 0	0.225 9	0.831 9	0.885 8
Dataset 2	NGRHMDA	0.454 4	0.356 2	0.021 8	0.358 1	0.301 1
	SMiR-NBI	0.834 9	0.094 2	0.033 6	0.108 1	0.415 6
	LRLSHMDA	0.783 8	0.484 0	0.073 3	0.489 6	0.824 8
	VDA-KATZ	0.551 2	0.757 4	0.080 5	0.753 5	0.829 6
	VDA-RWR	0.502 2	0.664 3	0.057 4	0.661 3	0.667 5
	VDA-BiRW	0.557 4	0.752 4	0.110 5	0.748 7	0.832 2
	VDA-RWLRLS	0.513 3	0.826 4	0.123 2	0.820 5	0.835 5
Dataset 3	NGRHMDA	0.358 2	0.408 1	0.011 9	0.407 4	0.255 4
	SMiR-NBI	0.923 0	0.042 7	0.023 0	0.053 6	0.436 5
	LRLSHMDA	0.812 9	0.523 9	0.055 2	0.527 5	0.816 9
	VDA-KATZ	0.711 6	0.566 6	0.062 6	0.568 4	0.847 8
	VDA-RWR	0.505 3	0.705 7	0.055 6	0.703 2	0.712 3
	VDA-BiRW	0.707 8	0.574 1	0.072 6	0.575 8	0.851 1
	VDA-RWLRLS	0.519 8	0.843 8	0.118 9	0.844 6	0.862 5

The performance of seven VDA prediction methods on three datasets. VDA-RWLRLS obtains better specificity, F1 score, accuracy, and AUC than the other six methods on the three datasets. On dataset 1, for example, the AUCs obtained from VDA-RWLRLS are 27.08%, 35.39%, 5.14%, 0.62%, 3.12%, and 1.05% better than NGRHMDA, SMiR-NBI, LRLSHMDA, VDA-KATZ, VDA-RWR, and VDA-BiRW, respectively. On dataset 2, the AUCs computed by VDA-RWLRLS are 63.96%, 50.26%, 1.28%, 0.71%, 20.11%, and 0.39% better than those of six methods, respectively. On dataset 3, the AUC calculated by VDA-RWLRLS increases 70.39%, 49.39%, 5.29%, 1.70%, 17.41%, and 1.32%, respectively. Fig. 2 illustrates the AUC values achieved from the seven VDA prediction methods on the three datasets. The AUC and F1 score can better evaluate the performance of the association prediction methods compared to the other three measurements. VDA-RWLRLS computes the best F1 scores and AUCs on the three VDA datasets. Therefore, VDA-RWLRLS can be effectively applied to rank unknown virus-drug pairs at a more accurate rate based on known VDAs.

Fig. 2

The AUC values predicted by seven VDA prediction methods on three VDA datasets.

The AUC values predicted by seven VDA prediction methods on three VDA datasets. More importantly, VDA-KATZ and VDA-RWR are two of the newest VDA prediction models and obtained good prediction performance. VDA-KATZ used the KATZ measurement, and VDA-RWR applied the random walk with restart on the constructed heterogeneous virus-drug network. Compared to the two state-of-the-art methods, VDA-RWLRLS boosts the prediction performance under the majority of conditions on the three datasets. Therefore, VDA-RWLRLS improves the VDA prediction accuracy. VDA-RWR, VDA-BiRW and VDA-RWLRLS use bi-random walk for VDA prediction. The three methods use random walk with restart, balanced bi-random walk, and unbalanced bi-random walk. VDA-RWLRLS significantly outperforms VDA-RWR and VDA-BiRW. The results demonstrate the contributions of LRLS to VDA classification ability. LRLSHMDA is an LRLS-based association prediction method. VDA-RWLRLS obtains better performance than LRLSHMDA on the three datasets. The results may be due to the left random walk on the virus network and the right random walks on the drug network: thus, considering the different maximal random walking steps based on two sides of the networks may boost the VDA prediction performance. In summary, integrating unbalanced bi-random walk and LRLS helps to mine underlying VDAs. In addition, all unknown drug-virus pairs were chosen as negative samples. Sensitivity was defined as TP/(TP + FN). In datasets 1, 2, and 3, there were 96, 770, and 407 known VDAs, respectively, and 840, 8062, and 6495 unknown virus-drug pairs, respectively. The ratios of known VDAs to unknown virus-drug pairs in the three VDA datasets were 11.43%, 9.55%, and 6.27%, respectively. Thus, FN is much greater than TP because of the unbalanced characteristics between negative VDAs and positive VDAs. Therefore, the sensitivity and F1 score are relatively lower for VDA prediction algorithms. In addition, on dataset 1, NGRHMDA and SMiR-NBI calculated AUCs larger than 0.5. However, they computed AUCs smaller than 0.5 on datasets 2 and 3. In contrast, if we re-evaluate the logical direction of the cut-off points, NGRHMDA and SMiR-NBI will obtain AUCs larger than 0.5 on datasets 2 and 3: however, AUCs from the two methods will be smaller than 0.5 on dataset 1. This may be caused by the poor generalization ability of NGRHMDA and SMiR-NBI.

Case study

We aimed to identify possible clues of treatment for COVID-19 after confirming the performance of VDA-RWLRLS. Table 4, Table 5, Table 6 list the top 10 predicted drugs against SARS-CoV-2 in the three datasets. The majority of the top 10 predicted antiviral drugs have been supported by recent literature. The powerful results indicate the prediction confidence of VDA-RWLRLS. Among the inferred antiviral drugs, there are two small molecules coming in three VDA datasets, that is, remdesivir and ribavirin. In addition, three chemical agents were combined into two VDA datasets: nitazoxanide, favipiravir, and niclosamide.

Table 4

The predicted top 10 drugs against SARS-CoV-2 on dataset 1.

Rank	Drug	Evidence
1	Remdesivir	PMID: 32 020 029, 31 996 494, 32 022 370, 31 971 553, 32 035 018, 32 035 533, 32 036 774, 32 194 944, 32 275 812, 32 145 386, 32 838 064
2	Oseltamivir	PMID: 32 034 637, 32 127 666
3	Zanamivir	PMID: 32 511 320
4	Ribavirin	PMID: 32 034 637, 32 127 666, 32 227 493, 26 492 219,32 771 797
5	Presatovir	PMID: 32 147 628
6	Elvitegravir	PMID: 32 147 628
7	Zidovudine	PMID: 32 568 013
8	Emtricitabine	PMID: 32 488 835
9	Laninamivir	https://arxiv.org/abs/2009.10333
10	Peramivir	unconfirmed

Table 5

The predicted top 10 drugs against SARS-CoV-2 on dataset 2.

Rank	Drug	Evidence
1	Favipiravir	PMID: 32 346 491, 32 967 849, 32 972 430
2	Niclosamide	PMID: 32 125 140, 32 221 153
3	Remdesivir	PMID: 32 020 029, 31 996 494, 32 022 370, 31 971 553, 32 035 018, 32 035 533, 32 036 774, 32 194 944, 32 275 812, 32 145 386, 32 838 064
4	Cyclosporine	PMID: 32 777 170, 32 505 466
5	Nitazoxanide	PMID: 32 127 666, 32 568 620, 32 448 490
6	Mycophenolic acid	PMID: 32 579 258
7	BCX4430 (Galidesivir)	PMID: 32 711 596
8	Emetine	PMID: 32 251 767,32 278 693,32 340 120
9	Amodiaquine	PMID: 32 246 834, 32 834 612, 32 631 083, 32 317 408
10	Ribavirin	PMID: 32 034 637, 32 127 666, 32 227 493, 26 492 219,32 771 797

Table 6

The predicted top 10 drugs against SARS-CoV-2 on dataset 3.

Rank	Drug	Evidence
1	Ribavirin	PMID: 32 034 637, 32 127 666, 32 227 493, 26 492 219, 32 771 797
2	Nitazoxanide	PMID: 32 127 666, 32 568 620, 32 448 490
3	Chloroquine	PMID: 32 020 029, 32 145 363, 32 074 550, 32 236 562
4	Camostat	PMID: 32 347 443
5	Umifenovir	PMID: 32 941 741
6	Favipiravir	PMID: 32 346 491, 32 967 849, 32 972 430
7	Amantadine	PMID: 32 361 028
8	Niclosamide	PMID: 32 125 140, 32 221 153
9	Remdesivir	PMID: 32 020 029, 31 996 494, 32 022 370, 31 971 553, 32 035 018, 32 035 533, 32 036 774, 32 194 944, 32 275 812, 32 145 386, 32 838 064
10	Berberine	PMID: 33 670 363

The predicted top 10 drugs against SARS-CoV-2 on dataset 1. The predicted top 10 drugs against SARS-CoV-2 on dataset 2. The predicted top 10 drugs against SARS-CoV-2 on dataset 3. Remdesivir is a triphosphate analog first reported as a potential therapeutic agent against Ebola. The drug has broad antiviral activity against the Arenaviridae, Coronaviridae, Filoviridae, Flaviviridae, Paramyxoviridae, and Pneumoviridae viral families. It has been confirmed as a non-obligate chain terminator of RNA-dependent RNA polymerase from SARS-CoV-2 and investigated in many COVID-19 clinical trials. On November 19, 2020, it was authorized for the treatment of COVID-19 in combination with baricitinib [58]. Ribavirin is an antiviral drug. This drug can inhibit viral RNA synthesis and mRNA capping. Thus it has broad-spectrum activity against a few RNA and DNA viruses. Currently, dual therapy with ribavirin and peginterferon alfa-2a/peginterferon alfa-2b is recommended as the first-generation and standard antiviral treatment [58].

Molecular docking

We used AutoDock 4.2, a molecular docking software, to conduct molecular docking between the predicted five antiviral drugs and the crystal structure of the S protein-binding domain bound with ACE2. The chemical agents are taken as ligands and the crystal structure is taken as the receptor. Molecular binding energy was used to measure the binding stabilities between the predicted five antiviral drugs and the crystal structure. Fig. 3 shows the molecular docking between remdesivir and ribavirin and the crystal structure. The results show that remdesivir has a binding energy of −7.00 kcal/mol with the crystal structure and that binding sites are K68 with ACE2 and Q493 with the S protein. Ribavirin has a binding energy of −6.59 kcal/mol, and its binding sites are K353 with ACE2 and G496, Q493, and R403 with the S protein. Molecular docking results for the other three drugs, nitazoxanide, favipiravir, and niclosamide and the crystal structure are illustrated in Figs. 1–3 in Supplementary Materials 1. The corresponding binding energies and sites are shown in Table 1 in Supplementary Materials 1.

Fig. 3

Molecular docking between remdesivir and ribavirin and the crystal structure of the S protein-ACE2 binding domain.

Molecular dynamics simulation

After performing molecular docking, we continued to conduct MDAs between the predicted possible drugs and the S protein, human ACE2, and the crystal structure of the S protein-binding domain bound with ACE2. Fig. 4 illustrates Root Mean Square Deviation (RMSD) trajectories of ACE2 and remdesivir, Radius of gyration (Rg) pattern of the remdesivir-ACE2 complex, Root Mean Square Fluctuation (RMSF) pattern of ACE2, and H-bond during MDS, and Solvent Accessible Surface Area (SASA) of the remdesivir-ACE2 complex backbone. From Fig. 4, we can observe that the RMSD trajectory of ACE2 (Fig. 4 (a)) obtains equilibrium beyond 10 ns with an average value of approximately 0.25 nm. The RMSD of remdesivir (Fig. 4 (b)) is approximately stable during MDS, with an average value of approximately 0.6 nm. The two average values indicate that the relative change in ligand position is less than that of human ACE2, thereby demonstrating the stability of the remdesivir-human ACE2 pose.

Fig. 4

MDS between remdesivir and human ACE2 protein during 50ns.

MDS between remdesivir and human ACE2 protein during 50ns. The lower mean value (2.5 nm) and stable trajectory of Rg (Fig. 4(c)) depict the compactness of the remdesivir-ACE2 complex during the simulation. In Fig. 4(d), the SASA of the complex is 0.1–0.5 nm2, indicating that the hydrophobic core is compact and that the conformational geometry of the complex is stable during MDS. In Fig. 4(e), remdesivir and ACE2 interact with 1–5 hydrogen bonds; however, only two bonds are observed to be consistent throughout MDA. In Fig. 4(f), RMSF is applied to measure the number of positional fluctuations of each residue in the remdesivir-ACE2 backbone during MDS. The RMSF value in the backbone is in the range of 0.05–0.4 nm with an average of approximately 0.15 nm and minimum fluctuations of the key active-site residues. The other MDSs between the predicted ligands (remdesivir, ribavirin, nitazoxanide, favipiravir, and niclosamide) and three receptors (the S protein, human ACE2, and the crystal structure of the S protein-ACE2 binding domain) under 50 ns and 10 ns are illustrated in Figs. 1–25 in Supplementary Materials 2.

Discussion and conclusion

The key to preventing the transmission of SARS-CoV-2 lies in a deep understanding of associations among viruses, target proteins and targeting drugs. Laboratory techniques may not be realistic under such an emergent situation. Drug repositioning is a more powerful approach to find possible clues for treatment of COVID-19. Identification of potential associations between viruses including SARS-CoV-2 and post-marked drugs through drug repositioning, provides an effective way to prioritize chemical agents related to SARS-CoV-2. Toward this goal, we first analyzed the pattern of associations between viruses and drugs and observed that the majority of known VDAs are circular bigraphs with short path lengths. This result provides the foundation for new VDA prediction based on known linkages in a VDA network. An Unbalanced bi-random walk is specifically developed to mine circular bigraphs and thus reconstructs associations between viruses and chemical agents. In addition, during an unbalanced bi-random walk, the jumps of nodes severely depend on virus similarities, drug similarities and known VDAs. To reduce the limitations, LRLS is integrated into the unbalanced bi-random walk model. VDA-RWLRLS is compared to six classical VDA prediction models. The experimental results show that VDA-RWLRLS outperforms the six association prediction methods. The novelty of VDA-RWLRLS comes from the combination of unbalanced bi-random walk, LRLS, molecular docking, and MDS. Our proposed VDA-RWLRLS algorithm performs consistently better on three constructed datasets. It may be contributed to by the following four features. First, VDA-RWLRLS uses the unbalanced bi-random walk method and takes advantage of different combinations in left and right random walks. Second, VDA-RWLRLS globally explores the weighted circular bigraphs by bi-random walk on the virus network and the drug network. The number of circular bigraphs affects the probability of a virus associated with a drug. Third, LRLS is utilized to decrease the drawback of unbalanced bi-random walk. Finally, VDA-RWLRLS computes the virus similarity and drug similarity by fusing biological information and association information in a VDA network. In addition, morbidity and mortality rates in the COVID-19 pandemic are sharply increasing because of bacterial and fungal co-infections. However, there is currently a lack of efficient clinical trials to screen antibacterial and antifungal compounds in COVID-19 patients [74]. Therefore, it is urgent to boost the knowledge about antibacterial and antifungal drugs to optimize the prevention, diagnosis, and therapeutic strategies in the COVID-19 pandemic. In the future, we will further mine drug banks and pharmacological resources and build two datasets, that is, datasets involved in associations between antibacterial drugs and single stranded RNA virus-caused diseases and datasets involved in associations between antifungal drugs and single stranded RNA virus-caused diseases, to initially screen small molecules for COVID-19 treatment.

Author contributions

LS, L-HP, F-XL, and L-QZ proposed computational models, L-HP and LS wrote the paper, L-HP and L-QZ revised original draft, LS run the computational models, LS conducted molecular docking, LS, HL and G-YL performed MDS, LS, L-HP, and L-QZ discussed the computational models and gave conclusion. All authors read and approved the final manuscript.

Funding

This research was funded by the (Grant 61803151, 62072172, 62172158), the Natural Science Foundation of Hunan province (Grant 2021JJ30219), scientific research project of Hunan Provincial Department of Education (20C0636), scientific research and innovation Foundation of Hunan University of Technology (Grant CX2031).

Data availability

Source code and dataset are freely downloadable at https://github.com/plhhnu/VDA-RWLRLS/. PDB ID of ACE2: 6M0J, DrugBank ID of Remdesivir: DB14761, Ribavirin: DB00811, Nitazoxanide: DB00507, Favipiravir: DB12466, Niclosamide: DB06803.

Declaration of competing interest

All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

10 in total

Review 1. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors: Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal: Chem Rev Date: 2022-05-20 Impact factor: 72.087

2. Modeling Kaempferol as a Potential Pharmacological Agent for COVID-19/PF Co-Occurrence Based on Bioinformatics and System Pharmacological Tools.

Authors: Yong Jiang; Yi-Zi Xie; Chen-Wen Peng; Kai-Nan Yao; Xue-Ying Lin; Shao-Feng Zhan; Hong-Fa Zhuang; Hui-Ting Huang; Xiao-Hong Liu; Xiu-Fang Huang; Hang Li
Journal: Front Pharmacol Date: 2022-06-08 Impact factor: 5.988

3. Hyb4mC: a hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction.

Authors: Ying Liang; Yanan Wu; Zequn Zhang; Niannian Liu; Jun Peng; Jianjun Tang
Journal: BMC Bioinformatics Date: 2022-06-29 Impact factor: 3.307

4. Predicting circRNA-drug sensitivity associations via graph attention auto-encoder.

Authors: Lei Deng; Zixuan Liu; Yurong Qian; Jingpu Zhang
Journal: BMC Bioinformatics Date: 2022-05-04 Impact factor: 3.307

5. GC-MS profiling of Bauhinia variegata major phytoconstituents with computational identification of potential lead inhibitors of SARS-CoV-2 M^pro.

Authors: Pallavi More-Adate; Kiran Bharat Lokhande; K Venkateswara Swamy; Shuchi Nagar; Akshay Baheti
Journal: Comput Biol Med Date: 2022-06-01 Impact factor: 6.698

6. Inferring Latent Disease-lncRNA Associations by Label-Propagation Algorithm and Random Projection on a Heterogeneous Network.

Authors: Min Chen; Yingwei Deng; Ang Li; Yan Tan
Journal: Front Genet Date: 2022-02-04 Impact factor: 4.599

7. Discovery of Potential Therapeutic Drugs for COVID-19 Through Logistic Matrix Factorization With Kernel Diffusion.

Authors: Xiongfei Tian; Ling Shen; Pengfei Gao; Li Huang; Guangyi Liu; Liqian Zhou; Lihong Peng
Journal: Front Microbiol Date: 2022-02-28 Impact factor: 5.640

8. Finding Lung-Cancer-Related lncRNAs Based on Laplacian Regularized Least Squares With Unbalanced Bi-Random Walk.

Authors: Zhifeng Guo; Yan Hui; Fanlong Kong; Xiaoxi Lin
Journal: Front Genet Date: 2022-07-22 Impact factor: 4.772

9. Identifying shared genetic loci between coronavirus disease 2019 and cardiovascular diseases based on cross-trait meta-analysis.

Authors: Hongping Guo; Tong Li; Haiyang Wen
Journal: Front Microbiol Date: 2022-09-15 Impact factor: 6.064

10. Prioritizing potential circRNA biomarkers for bladder cancer and bladder urothelial cancer based on an ensemble model.

Authors: Qiongli Su; Qiuhong Tan; Xin Liu; Ling Wu
Journal: Front Genet Date: 2022-09-15 Impact factor: 4.772

10 in total