Literature DB >> 34286288

An in silico drug repositioning workflow for host-based antivirals.

Zexu Li^1,2, Yingjia Yao^1,2, Xiaolong Cheng^3,4, Wei Li^3,4, Teng Fei^1,2.

Abstract

Drug repositioning represents a cost- and time-efficient strategy for drug development. Artificial intelligence-based algorithms have been applied in drug repositioning by predicting drug-target interactions in an efficient and high throughput manner. Here, we present a workflow of in silico drug repositioning for host-based antivirals using specially defined targets, a refined list of drug candidates, and an easily implemented computational framework. The workflow described here can also apply to more general purposes, especially when given a user-defined druggable target gene set. For complete details on the use and execution of this protocol, please refer to Li et al. (2021).

Entities: Chemical

Keywords: Bioinformatics; High Throughput Screening; Immunology; Microbiology; Molecular Biology; Structural Biology

Mesh：

Substances：
Antiviral Agents

Year: 2021 PMID： 34286288 PMCID： PMC8273420 DOI： 10.1016/j.xpro.2021.100653

Source DB: PubMed Journal: STAR Protoc ISSN： 2666-1667

Before you begin

Overview

Artificial intelligence-based algorithms have been applied in drug repositioning as well as other relevant fields (Hao et al., 2016; Pushpakom et al., 2019; Tanoli et al., 2021; Wang et al., 2020; Yang et al., 2020; Zhou et al., 2020). This protocol below describes the specific steps of in silico drug repositioning for antivirals against Coronaviridae viral families including SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), SARS-CoV (severe acute respiratory syndrome coronavirus) and MERS-CoV (Middle East respiratory syndrome coronavirus) using Coronaviridae-specific host dependency gene set, refined drug candidate list covering 2457 marketed drugs and 1062 natural compounds, and DeepCPI algorithm for drug-target interaction (DTI) prediction. Moreover, this workflow can be extended for broader drug repositioning purposes, given a user-defined target gene set, a custom list of candidate drug chemicals and implementation of more DTI prediction algorithms. For specific drug repurposing against Coronaviridae family viruses, we should firstly define the proper gene set for candidate drugs to target. In addition to limited number of virus-specific genes, host dependency genes (HDGs) with functional implications whose loss-of-function renders host resistance to specific viral infection may serve as an ideal target gene pool for inhibitory drugs to exert antiviral effect. Public datasets derived from functional genetic screens using techniques such as gene-trap, RNA interference (RNAi) and clustered regularly interspaced palindromic repeats (CRISPR) have provided a wealth of resource about virus-specific HDGs. We have collected Coronaviridae-specific HDGs in our previous study (Li et al., 2021) and use them as target gene set in this protocol. HDGs for a broader range of RNA viruses can also be found in Li et al., 2021. For the interrogated drug candidates, we build a chemical cohort by collecting 2457 Food and Drug Administration (FDA) approved drugs (Database: DrugBank, version 5.1.7, released 2020-07-02; https://www.drugbank.ca) and 1062 selected natural compounds embedded in herbs of traditional Chinese medicine with favorable druggability (Li et al., 2021). This refined drug candidate list does not include experimental and investigational chemicals. Since FDA approved drugs and herbs of traditional Chinese medicine have already been applied in humans, this refined cohort may represent the safest drug candidates to be readily tested for clinical trials. Precise and efficient DTI prediction stands in a central position for successful drug repositioning. Multiple artificial intelligence-based algorithms have been developed to predict DTI between multiple drugs and targets. In this protocol, we employ DeepCPI, a computational framework using feature-embedding and deep learning, for DTI prediction (Wan et al., 2019). Compared to other pipelines, DeepCPI is quite computationally efficient which can be run even by a personal computer while maintaining decent predicting power (For example, in the current protocol, DeepCPI can be run on the MacBook Pro with 8 GB of memory to predict 405,405 hypothetical DTI pairs in about 1 h). Each drug-target pair is scored by DeepCPI for their potential interaction, and repurposed drug candidates are then prioritized according to their targeting range (the number of predicted targets) and strength for interrogated targets (targeting potential reflected by DTI score). For the top ranked drug candidates, molecular docking analysis is performed to take a closer examination for the binding interface and free energy of potential drug-target interaction. The workflow generates a ranked list of potential repurposed drug candidates against Coronaviridae viruses that are ready for in-depth experimental and clinical evaluation.

Software setup and installation

Timing:∼1 day A personal computer with Linux- or Unix-based operating system is required to execute this protocol. The prerequisite software (in key resources table) can be downloaded from the corresponding websites. The accompanying user manuals provide detailed information about their functions and uses. Set up the operating environment for DeepCPI. Requirement: Python2.7, Keras=1.2.2, Gensim=0.10.2, Tensorflow=1.2.0, RDKit. The source code of DeepCPI can be downloaded from https://github.com/FangpingWan/DeepCPI. We also recommend the user to install conda (environment management system) (https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html). Install DeepCPI using command line under Unix or Linux system. Open Terminal conda create -n DeepCPI python=2.7 (#create a Python 2.7 environment) source activate DeepCPI (#activate virtual environment) conda install RDKit conda install Keras=1.2.2 conda install Gensim=0.10.2 cd [The path of DeepCPI] (e.g., “cd /…/DeepCPI-master”. #Change directory to the home directory of the DeepCPI folder named “DeepCPI-master”) python DeepCPI.py (#Run test data) For advanced help, please see page on the GitHub (https://github.com/FangpingWan/DeepCPI). Download and install software for molecular docking analysis. Download and install AutoDock software (http://autodock.scripps.edu; version 4.2.6) (Morris et al., 2009). Download and install MGLTools software (http://mgltools.scripps.edu/downloads; version 1.5.6). Download and install PyMOL software (https://pymol.org/2/, version 2.3.2, open-source project).

Key resources table

Step-by-step method details

Define the druggable target gene set

Timing: ∼3 days Any user-defined target gene set can be used for this protocol towards more general applications. As a specific example, the definition of target gene set against Coronaviridae viruses is shown in the following steps. Collect public datasets to define Coronaviridae-specific HDGs. Collect the references performing high throughput genetic perturbation screening for Coronaviridae virus resistance in human cells. In these studies, gene-trap, RNAi and CRISPR techniques are employed to perturb a gene’s function. For example, use the search key word “SARS-CoV-2 AND screen” to collect SARS-CoV-2 virus-related screening references from PubMed (https://pubmed.ncbi.nlm.nih.gov/). References for other Coronaviridae viruses such as MERS-CoV and SARS-CoV can be collected similarly. Pinpoint the datasets reporting the viral resistance HDGs associated with these references. Collect scattered HDGs for Coronaviridae viruses from individual literatures in which specific genes are shown to be critical or essential for complete viral life cycle (non-screen study). Filter the collected data to pinpoint HDGs. If a host gene or its encoding protein is shown only to physically interact with viral proteins or regulated by viral genes but without functional implication on viral life cycle upon gene’s loss-of-function, the gene is not classified as a HDG. A gene is defined as a HDG only when it meets any of the following criteria: Its loss-of-function impedes or reduces viral infection or activity by experimental evidence in non-screen studies. It has been clearly classified into HDG group in screen studies. When HDG group is not specified in screen studies, arbitrarily take the top ∼5% of all the interrogated genes in the positive selection list as HDGs with a custom log-fold change cutoff in CRISPR knockout or RNAi screens. For example, in a typical result output generated by MAGeCK (Li et al., 2015; Li et al., 2014) analytic pipeline for CRISPR screens, genes can be ranked according to their negative or positive selection trend by jointly considering the log-fold change and statistical significance of their corresponding guide RNAs. HDGs can be arbitrarily defined as the top ∼5% of all the genes with a log-fold change of 1.0 (loose cutoff) or 2.0 (stringent cutoff). Define high confidence HDG gene set for Coronaviridae family viruses. As there are several independent studies and datasets for HDG identification against Coronaviridae family viruses, we only take a subset of HDGs that occurs more than once among different datasets as high confidence HDGs for further analysis. A total of 165 high confidence HDGs are defined for Coronaviridae viruses (Figure 1A). After that, prepare a HDG file in the structure of “gene symbol + amino acid sequence” (Figure 1B, Supplemental File S1).

Figure 1

Prepare DeepCPI input file

(A) Gene symbol list of target gene set exemplified by 165 high confidence HDGs for Coronaviridae viruses.

(B) The structure, layout and information of the text files for drugs and targets.

In addition to PubMed, public integrated database such as “CRISP-view” (http://crispview.weililab.org/) can also be used to search high throughput genetic screen studies or datasets (Cui et al., 2021). In addition, virus-specific HDGs for 10 families and 29 species of RNA viruses can be downloaded from (Li et al., 2021). Prepare DeepCPI input file (A) Gene symbol list of target gene set exemplified by 165 high confidence HDGs for Coronaviridae viruses. (B) The structure, layout and information of the text files for drugs and targets. (C) The structure, layout and information of the merged text file generated as DeepCPI input.

Define the cohort of candidate drugs or chemicals for repurposing

Timing: ∼1 day Collect FDA approved drug information. Drug information is extracted from Database: DrugBank (version 5.1.7, released 2020-07-02; https://www.drugbank.ca) (Wishart et al., 2018). Open DrugBank website -> Download -> Structures -> Structure External Links -> Approved -> Download. (#Download FDA approved drug data with InChI (the IUPAC International Chemical Identifier) information from the DrugBank website) Extract the DrugBank ID and InChI, and save them as separate files in the structure of “DrugBank ID + InChI” (Supplemental File S1). A total of 2457 FDA approved drugs are collected with InChI information. Note that the InChI value is required for DeepCPI. Collect natural compound information. Natural compound information is downloaded from Database: Traditional Chinese Medicine Systems Pharmacology (TCMSP) (version 2.3, released 2014-05-31; https://tcmspw.com/tcmsp.php) which is a unique systems pharmacology platform of Chinese herbal medicines (Ru et al., 2014). Filter the pool of 1455 natural compounds for better druggability by requiring each candidate passing the criteria of oral bioavailability (OB) ≥ 30.0%, drug-likeness (DL) ≥ 0.18 and blood-brain barrier (BBB) ≥ -0.30. Finally, 1062 selected natural compounds with InChI information are kept for the downstream DTI analysis. Extract the compound ID and InChI, and save them as separate files in the structure of “compound ID + InChI” (Supplemental File S1). The above drug cohort information used in this protocol can be found in Table S1.

Prepare DeepCPI input file

Timing: ∼2 h (variable) DeepCPI requires two layers of information for DTI prediction: “the InChl information of drugs” and “the amino acid sequence of target gene-encoding proteins”. Prepare a txt file (e.g., “Drugbank_Approved.txt” or “TCM_selected.txt”) containing the InChl information for each drug (Figure 1B, Supplemental File S1). Prepare a txt file (e.g., “Coronaviridae_HDGs.txt”) containing the amino acid sequence for each target protein (Figure 1B, Supplemental File S1). The amino acid sequences are extracted from UniProt database (https://www.uniprot.org/). Save the two files (“Coronaviridae_HDGs.txt” and “Drugbank_Approved.txt”) under the same directory. Open Terminal. Change directory to where the files (“Coronaviridae_HDGs.txt” and “Drugbank_Approved.txt”) are located by typing “cd /your/working/path”. Run python script “DrugTargtPairGenerator.py” by typing “python DrugTargtPairGenerator.py --f1 Coronaviridae_HDGs.txt --f2 Drugbank_Approved.txt” to generate a merged txt file (e.g., “Drug_Target_Pair.txt”) with each possible drug-target pair (Figure 1C).

DTI prediction by DeepCPI

Timing: ∼2h Run the DeepCPI pipeline and calculate the DeepCPI score for drug-target pair. Paste the merged input file (e.g., “Drug_Target_Pair.txt”) into the DeepCPI folder and rename it as “example.tsv”. (#DeepCPI uses “example.tsv” as default input file) Open Terminal. Activate conda environment by typing “source activate DeepCPI”. Change directory to the home directory of the DeepCPI folder named “DeepCPI-master” by typing “cd [The path of DeepCPI]”. (e.g., “cd /…/DeepCPI-master“) Run the DeepCPI pipeline under the DeepCPI folder by typing the command “python DeepCPI.py”. A file named “Prediction_results.tsv” is generated at the end of the run. Each drug-target pair is assigned a DeepCPI score (range 0–1) representing their interaction potential. The higher score indicates higher interaction potential. Change directory to where the files (“Prediction_results.tsv”, “Coronaviridae_HDGs.txt”, and “Drugbank_Approved.txt” stored under the same directory) are located by typing “cd /your/working/path”. Run python script “MatricesGenerator.py” by typing “python MatricesGenerator.py --f1 Prediction_results.tsv --f2 Coronaviridae_HDGs.txt --f3 Drugbank_Approved.txt” to create a score matrices named “Prediction_results.matrix.txt” with DeepCPI score for each drug-target pair (), where l refers to the length of drug list and k refers to the length of target list: Run python script “FilterOutNonSignificant.py” by typing “python FilterOutNonSignificant.py -f Prediction_results.matrix.txt -c 0.892” to filter out the non-significant DTI scores and only keep the confident scores. The output file is “Prediction_results.matrix.filtered.txt”. The optimal standardized DeepCPI score threshold (0.892, sensitivity: 37.2%, specificity: 86.8%) is determined by receiver operating characteristics (ROC) analysis with benchmark datasets (Li et al., 2021). This pre-defined threshold may change when different benchmark datasets are used to evaluate DeepCPI performance. Once defined, such threshold is applicable to any DTI analysis using DeepCPI for different target gene sets and drug sets. When more DTI prediction algorithms are applied to alleviate the bias of each algorithm and improve the prediction precision, each method generates a prediction score for the same drug-target pair. However, the score distribution pattern is usually different between different methods. To make these DTI scores comparable, a z-score based normalization is recommended as exemplified in the following steps to standardize DeepCPI score. DTI scores derived from other prediction algorithms can be normalized in the similar manner. Open and run R script “ZscoreNormalization.Rmd” to generate z-score matrices named “z_Prediction_results.txt”, where, μ is mean value of the original scores and σ is standard deviation of the original scores: Open Terminal. Change directory to where the files (“z_Prediction_results.txt”, “Coronaviridae_HDGs.txt”, and “Drugbank_Approved.txt” stored under the same directory) are located by typing “cd /your/working/path”. Run python script “MatricesGenerator.py” by typing “python MatricesGenerator.py --f1 z_Prediction_results.txt --f2 Coronaviridae_HDGs.txt --f3 Drugbank_Approved.txt”. This command will create a z-score matrices named “z_Prediction_results.matrix.txt” with standardized DeepCPI score for each drug-target pair (), where l refers to the length of drug list and k refers to the length of target list: Run python script “FilterOutNonSignificant.py” by typing “python FilterOutNonSignificant.py -f z_Prediction_results.matrix.txt -c 0.641”. This command will filter out the non-significant DTI scores and only keep the confident scores. The output file is “z_Prediction_results.matrix.filtered.txt”. The optimal standardized DeepCPI score threshold (0.641, sensitivity: 73%, specificity: 51.9%) is determined by receiver operating characteristics (ROC) analysis with benchmark datasets (Li et al., 2021). This pre-defined threshold may change when different benchmark datasets are used. Once defined, such threshold for standardized DeepCPI score is applicable for different target gene sets and drug sets.

Prioritize repurposed drug candidates

Timing: ∼10 min Repurposed drug candidates are ranked primarily according to their targeting range (the number of target) and targeting strength (the interaction potential of target). Prioritize the drug candidates using P_score that only considers the HDG target-associated DTIs. P_score is calculated for each drug candidate by the following formula, where represents filtered DeepCPI score for each drug-target pair and k refers to the length of target list. Open the file “Prediction_results.matrix.filtered.txt” using Excel sheet. Drugs are listed in rows and targets are listed in columns. For each drug, calculate P_score using the above formula (AVERAGE function). The higher of P_score, the better the corresponding drug is prioritized. The drug candidates can be ranked according to their P_score. If using normalized z-score, calculate P_score for each drug candidate corresponding to each DTI prediction method by the following formula exemplified by DeepCPI, where represents filtered DeepCPI score for each drug-target pair and k refers to the length of target list. Drug candidates can be ranked by integrative consideration of multiple P_score derived from each DTI prediction methods.

Molecular docking analysis of top ranked drugs

Timing: ∼4 h To further examine the potential binding interface and free energy between top ranked drugs and their predicted target proteins, molecular docking analysis can be performed. Using Baricitinib (one of the top ranked repurposed drugs against Coronaviridae viruses) and its predicted target DYRK1A as an example, molecular docking analysis is performed as in the following steps. The docking parameters may vary depending on the interrogated drug/target pair. Prepare the ligand. Download the chemical structure file for Baricitinib (PubChem CID: 44205240) from PubChem website (https://pubchem.ncbi.nlm.nih.gov/) in SDF format (named as “Baricitinib.SDF”). Open a PyMOL software browser and input the ligand file “Baricitinib.SDF”. Export and save as “ligand.PDB” formatted file. Open the AutoDock software and input the “ligand.PDB” file (Figure 2A).

Figure 2

Pre-processing procedures of molecular docking analysis

(A) Illustration of “Input the ligand” step by AutoDock software.

(B) Illustration of “Choose the torsions of the ligand” step in AutoDock.

(D) Illustration of “Remove waters of protein” step in AutoDock.

(E) Illustration of “Add polar hydrogens of protein” step in AutoDock.

(F) Illustration of “Delete pre-embedded ligand” step in AutoDock.

(G) Example of “Delete the other chains and solvents of the protein” step (D chain of DYRK1A in 6SIE.pdb) in AutoDock.

Pre-processing procedures of molecular docking analysis (A) Illustration of “Input the ligand” step by AutoDock software. (B) Illustration of “Choose the torsions of the ligand” step in AutoDock. (C) Illustration of “Output ligand.pdbqt file” step in AutoDock. (D) Illustration of “Remove waters of protein” step in AutoDock. (E) Illustration of “Add polar hydrogens of protein” step in AutoDock. (F) Illustration of “Delete pre-embedded ligand” step in AutoDock. (G) Example of “Delete the other chains and solvents of the protein” step (D chain of DYRK1A in 6SIE.pdb) in AutoDock. Click “Ligand->Torsion Tree” and select “Choose Torsions” module (Figure 2B). The red chemical bond means un-rotatable, the green chemical bond means rotatable. Output and save as “ligand.pdbqt” formatted file (Figure 2C). Prepare the protein receptor. The protein structure of DYRK1A (PDB: 6EIS) is downloaded from RCSB PDB website (http://www1.rcsb.org) in PDB format. Open a PyMOL software browser to input the file “6SIE.pdb”. Remove waters (Figure 2D) and add polar hydrogens (Figure 2E). Choose the primary ligand of DYRK1A at the 321st amino acid position of A chain, and remove the pre-embedded ligand (Figure 2F). Delete the other chains (B, C, and D chains of DYRK1A in 6SIE.pdb) and solvents of the protein (Figure 2G). Save as “protein.pdb” formatted file. Open the AutoDock software and input the “protein.pdb” file. Set the atoms using “Assign AD4 type” module (Figure 3A).

Figure 3

Continued procedures of molecular docking analysis

(A) Illustration of setting the atoms using “Assign AD4 type” module in AutoDock software.

(B) Illustration of computing the Gasteiger charges for protein molecules in AutoDock.

(D) Example of setting the center of grid box size to cover the active pocket in AutoDock.

(E) Illustration of outputting the Lamarckian GA result.

(F) Illustration of showing the interactions between ligand and protein.

(G) Illustration of analyzing different conformations of the ligand.

(H) Example of docking result showing the interaction between Baricitinib and DYRK1A.

Continued procedures of molecular docking analysis (A) Illustration of setting the atoms using “Assign AD4 type” module in AutoDock software. (B) Illustration of computing the Gasteiger charges for protein molecules in AutoDock. (C) Illustration of exporting and saving as “protein.pdbqt” formatted file in AutoDock. (D) Example of setting the center of grid box size to cover the active pocket in AutoDock. (E) Illustration of outputting the Lamarckian GA result. (F) Illustration of showing the interactions between ligand and protein. (G) Illustration of analyzing different conformations of the ligand. (H) Example of docking result showing the interaction between Baricitinib and DYRK1A. Compute the Gasteiger charges for protein molecules (Figure 3B). Export and save as “protein.pdbqt” formatted file (Figure 3C). Set the grid box. Open the “Grid” module and input the “protein.pdbqt” file. Set map types and input the “ligand.pdbqt” file. Open “Grid Box” module to set the position of grid box. Set the center of grid box size: X center: -0.424, Y center: -16.948, Z center: -8.144. Then, set the number of points in X (60), Y (60) and Z (60) dimension of grid box to cover the active pocket (Figure 3D). Save as “dock.gpf” formatted file. Analyze the grid docking. Choose the “Docking” module, and input the protein and ligand files (“protein.pdbqt” and “ligand.pdbqt”). Click “Docking->Search Parameters” and choose “Genetic Algorithm” module. Click “Docking->Docking Parameters” and use the default settings. Output the Lamarckian GA result and save as “dock.dpf” formatted file (Figure 3E). Run the “AutoGrid” and “AutoDock” module with “dock.gpf” and “dock.dpf” file, respectively. A “dock.dlg” file is then generated. Open the “dock.dlg” file and protein file (“protein.pdbqt”). Show the interactions between ligand and protein (Figure 3F). Analyze the conformations of ligand and click this button () (Figure 3G). The DashBoard shows the binding energy under different ligand conformations with the lowest binding energy of -8.07 kcal/mol for potential interaction between Baricitinib and DYRK1A A chain. Output the complex interactions, and save as “result.pdbqt” formatted file. Visualize the results of docking. Open the PyMOL browser and input the “result.pdbqt” file. Set the shape and color of the protein or the ligand. Display the background as “white”. Output and save the picture of docking result as “docking.png” file (Figure 3H). Other molecular docking software can also be utilized. The binding interface and free energy may differ when using different molecular docking platforms. If there is no structure of interrogated target protein available in PDB website, protein structure prediction by homology modeling may be performed. If there is only apo-structure available where the target protein is not in complex with drugs or small molecules, binding pocket prediction or blind docking can be performed with molecular docking software. If a deeper computational investigation on the binding-function relationship is needed, molecular dynamics (MD) simulation can be performed as elaborated in other literatures (Maximova et al., 2016; Mei et al., 2021; Yang et al., 2020).

Expected outcomes

In this protocol, we describe an in silico drug repositioning workflow to identify potential antiviral drugs against Coronaviridae viruses using HDGs as drug targets. A complete table listing the predicted DTI for each drug-target pair is generated, and a ranked list of the repurposed drug candidates is provided (Table S2). If there are positive control drugs with definite DTIs in other scenarios, they are expected to be present among the top positions of the ranked list. The binding details between top predicted drugs and targets are illustrated by molecular docking analysis. These results may expedite the drug development for infectious diseases caused by Coronaviridae viruses such as COVID-19. This strategy should be helpful to repurpose “old drug” for novel antiviral uses by facilitating the selection of lead compound for in-depth experimental and clinical evaluation.

Limitations

There are several limitations for this protocol. Firstly, the target gene set of Coronaviridae-specific HDGs may not be complete and the strength variation of perturbation impact between different HDGs is ignored. Secondly, only one DTI prediction algorithm (DeepCPI) is illustrated here and it is highly recommended to incorporate more independent algorithms to increase the precision and reduce the bias for DTI prediction. Thirdly, this protocol does not include the validation steps. The top ranked repurposed drug candidate should be readily selected and experimentally validated by performing in vitro assays for their cytotoxicity, antiviral activity and physical drug-target interaction before proceeding to more advanced evaluations.

Troubleshooting

Problem 1

The software and algorithms used in this protocol do not run through properly (Before you begin-Software setup and installation).

Potential solution

Double check the computer settings, make sure the downloaded versions of the software or algorithms are correct, and install them according to their manuals. Use the test data or files provided in this study to evaluate whether the software and algorithms are working properly.

Problem 2

Only a limited number of HDGs can be collected for specific type of virus (steps 1–3). Insufficient number of target genes may decrease the probability and precision of drug repositioning due to low coverage of true HDGs. We recommend to expand the HDGs by additionally considering the HDG data from closely related viruses, for example, within the same viral family rather than only restricted to certain species of viruses.

Problem 3

DeepCPI is successfully installed and go through using the test data embedded in DeepCPI folder, however, it fails to generate results using user-provided data (steps 6–12). Make sure to execute the program under the home directory of DeepCPI folder, double check the format of the input file, and remove any delimiter in the InChI value that may change the data structure.

Problem 4

It is difficult to determine the position of the grid box for the protein during molecular docking (step 16). We recommend to try the following steps: firstly, refer to the literatures to identify potential active pocket of the protein; secondly, use “blind docking” or “binding pocket prediction” approach by AutoDock software.

Problem 5

The positive control drugs (if there are) are not in the top positions among the prioritized rank list of repurposed drugs (Expected Outcomes). Carefully select the target gene set, make sure the positive control drugs are within the interrogated drug cohort, and/or apply multiple DTI prediction algorithms for drug repositioning.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Teng Fei (feiteng@mail.neu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

This published article includes all datasets generated or analyzed during this study. The Python and R scripts can be found at the GitHub repository for this protocol (https://github.com/zexuneu/computational-framework-of-host-based-drug-repositioning).

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

Target gene set of Coronaviridae-specific HDGs with amino acid sequence information	Supplemental File S1	Coronaviridae_HDGs.txt
Approved drug list with InChI information	Supplemental File S1	Drugbank_Approved.txt
Selected natural compound list with InChI information	Supplemental File S1	TCM_selected.txt
Drug candidate cohort information	Table S1	Drug_cohort_information.xlsx
Predicted DTI and ranked list of repositioned drugs against Coronaviridae viruses	Table S2	DTI_and_ranked_drug_list.xlsx
DrugTargtPairGenerator.py	This study	https://github.com/zexuneu/computational-framework-of-host-based-drug-repositioning
MatricesGenerator.py	This study	https://github.com/zexuneu/computational-framework-of-host-based-drug-repositioning
FilterOutNonSignificant.py	This study	https://github.com/zexuneu/computational-framework-of-host-based-drug-repositioning
ZscoreNormalization.Rmd	This study	https://github.com/zexuneu/computational-framework-of-host-based-drug-repositioning

Software and algorithms

DeepCPI	(Wan et al., 2019)	https://github.com/FangpingWan/DeepCPI
AutoDock	(Morris et al., 2009)	http://autodock.scripps.edu
MGLTools	MGLTools Website	http://mgltools.scripps.edu/downloads
PyMOL	Schrödinger	https://pymol.org/2/

15 in total

Review 1. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics.

Authors: Tatiana Maximova; Ryan Moffatt; Buyong Ma; Ruth Nussinov; Amarda Shehu
Journal: PLoS Comput Biol Date: 2016-04-28 Impact factor: 4.475

2. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility.

Authors: Garrett M Morris; Ruth Huey; William Lindstrom; Michel F Sanner; Richard K Belew; David S Goodsell; Arthur J Olson
Journal: J Comput Chem Date: 2009-12 Impact factor: 3.376

3. LARMD: integration of bioinformatic resources to profile ligand-driven protein dynamics with a case on the activation of estrogen receptor.

Authors: Jing-Fang Yang; Fan Wang; Yu-Zong Chen; Ge-Fei Hao; Guang-Fu Yang
Journal: Brief Bioinform Date: 2020-12-01 Impact factor: 11.622

Review 4. Drug repurposing: progress, challenges and recommendations.

Authors: Sudeep Pushpakom; Francesco Iorio; Patrick A Eyers; K Jane Escott; Shirley Hopper; Andrew Wells; Andrew Doig; Tim Guilliams; Joanna Latimer; Christine McNamee; Alan Norris; Philippe Sanseau; David Cavalla; Munir Pirmohamed
Journal: Nat Rev Drug Discov Date: 2018-10-12 Impact factor: 84.694

5. DrugBank 5.0: a major update to the DrugBank database for 2018.

Authors: David S Wishart; Yannick D Feunang; An C Guo; Elvis J Lo; Ana Marcu; Jason R Grant; Tanvir Sajed; Daniel Johnson; Carin Li; Zinat Sayeeda; Nazanin Assempour; Ithayavani Iynkkaran; Yifeng Liu; Adam Maciejewski; Nicola Gale; Alex Wilson; Lucy Chin; Ryan Cummings; Diana Le; Allison Pon; Craig Knox; Michael Wilson
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

6. DeepCPI: A Deep Learning-based Framework for Large-scale in silico Drug Screening.

Authors: Fangping Wan; Yue Zhu; Hailin Hu; Antao Dai; Xiaoqing Cai; Ligong Chen; Haipeng Gong; Tian Xia; Dehua Yang; Ming-Wei Wang; Jianyang Zeng
Journal: Genomics Proteomics Bioinformatics Date: 2020-02-06 Impact factor: 7.691

7. Protocol for hit-to-lead optimization of compounds by auto in silico ligand directing evolution (AILDE) approach.

Authors: Longcan Mei; Fengxu Wu; Gefei Hao; Guangfu Yang
Journal: STAR Protoc Date: 2021-02-01

8. A computational framework of host-based drug repositioning for broad-spectrum antivirals against RNA viruses.

Authors: Zexu Li; Yingjia Yao; Xiaolong Cheng; Qing Chen; Wenchang Zhao; Shixin Ma; Zihan Li; Hu Zhou; Wei Li; Teng Fei
Journal: iScience Date: 2021-02-05

9. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR.

Authors: Wei Li; Johannes Köster; Han Xu; Chen-Hao Chen; Tengfei Xiao; Jun S Liu; Myles Brown; X Shirley Liu
Journal: Genome Biol Date: 2015-12-16 Impact factor: 13.583

10. CRISP-view: a database of functional genetic screens spanning multiple phenotypes.

Authors: Yingbo Cui; Xiaolong Cheng; Qing Chen; Bicna Song; Anthony Chiu; Yuan Gao; Tyson Dawson; Lumen Chao; Wubing Zhang; Dian Li; Zexiang Zeng; Jijun Yu; Zexu Li; Teng Fei; Shaoliang Peng; Wei Li
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971