Literature DB >> 31360332

A Web Tool for Ranking Candidate Drugs Against a Selected Disease Based on a Combination of Functional and Structural Criteria.

Evangelos Karatzas^1,2, George Minadakis^2,3, George Kolios⁴, Alex Delis¹, George M Spyrou^2,3.

Abstract

Drug repurposing techniques allow existing drugs to be tested against diseases outside their initial spectrum, resulting in reduced cost and eliminating the long time-frames of new drug development. In silico drug repurposing further speeds up the process either by proposing drugs suitable to invert the transcriptomic profile of a disease or by indicating drugs based on their common targets or structural similarity with other drugs with similar mode of action. Such methods usually return a number of potential repurposed drugs that need to be tested against the disease in in vitro, pre-clinical and clinical studies. Thus, it is crucial to have a more sophisticated candidate drug ranking in order to start testing from the most promising chemical substances. As a means to enhance the above decision process, we present CoDReS (Composite Drug Reranking Scoring), a drug (re-)ranking web-based tool, which combines an initial drug ranking (i.e. repurposing score or hypothesis/potentiality score) with a functional score of each drug considered in conjunction with the disease under study as well as with a structural score derived from potential drugability violations. Furthermore, a structural similarity clustering is applied on the considered drugs and a handful of structural exemplars are suggested for further in vitro and in vivo validation. The user is able to filter the results further, through structural similarity examination of the candidate drugs with drugs that have failed against the queried disease where related clinical trials have been carried out. CoDReS is publicly available online at http://bioinformatics.cing.ac.cy/codres.

Entities: CellLine Chemical Disease Species

Keywords: Cheminformatics; Data mining; Drug discovery; Drug ranking

Year: 2019 PMID： 31360332 PMCID： PMC6637175 DOI： 10.1016/j.csbj.2019.05.010

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Transcriptomic-based computational drug repurposing (DR) tools, such as Connectivity Map [1] and L1000CDS2 [2], compare a disease-related gene expression profile with a number of stored existing expression profiles corresponding to cellular responses against a number of perturbations. Existing tools return lists of candidate repurposed drugs, which can be ordered by their inhibition score. The inhibition score describes the potentiality of a chemical substance to alter the perturbed gene signature state of a disease back to its “normal-healthy” values. Although the inhibition score might give insight onto the potency of a drug against a disease, it alone cannot guarantee success in a clinical trial. On the other hand, cheminformatics tools, such as ChemMine Tools [3] and programming packages such as Rcpi [4] and ChemmineR [5] can suggest drugs with similar structure and possibly similar mode of action to drugs with a-priori knowledge regarding their effectiveness either against a specific disease-related mechanism or against diseases with phenotypic similarity to the targeted disease. However, the derived similarity score is often not enough to deem a drug an appropriate candidate against a disease. Other types of drug information are ought to be examined, like the candidate drug's functional relation to the disease and its binding affinity to any related-to-the-disease gene target as well as its drug-likeness evaluation based on structural rules that might categorize the drug inappropriate for clinical trials. In order to attain both the scoring implementation for these different drug aspects and provide a more meaningful ranking of the candidate repurposed drugs, we have developed the CoDReS (Composite Drug Reranking Score) web-based tool based on- and extending the initial methodology introduced in [6] in the following ways; CoDReS integrates information from updated biological databases, incorporates binding affinity scores between ligands and proteins, evaluates drug-likeness and presents structural similarities between input drugs and possible failed drugs that have already been tested against the queried disease in clinical trials. A summary figure of the CoDReS pipeline is depicted in Fig. 1.

Fig. 1

CoDReS summary figure.

Tool Description

Scoring Scheme

A composite score (from here on referred to as CoDReS) is calculated, for each drug, as the normalized weighted sum of the initial a-priori score (aS) with a functional (FS) and a structural score (StS) as introduced below: The weights waS, wFS and wStS are user-defined parameters that determine the desired influence of each part (a-priori, functional and structural scores respectively) to the final score and have equal default values. The a-priori scores can be uploaded by the user and are automatically normalized in the unit interval [0, 1] by dividing with the absolute maximum a-priori score. The functional score requires the calculation of two different parameters: the Confidence Score, which reflects the gene-disease association and the Ki, which is an inhibitory constant, measured in nM, and represents the reciprocal of the binding affinity between the inhibitor (drug) and the enzyme (target) [7]. The smaller the Ki, the greater the binding affinity. The FS for each drug is calculated as the sum of the products of Confidence Score with the inverse value of Ki, for each gene target of the drug that has been related to the queried disease. Each drug's FS is finally normalized in [0, 1] by dividing with the maximum FS. The structural score calculates a substance's drug-likeness based on the Lipinski “rules of 5” [8] and Veber's rule [9]. According to the Lipinski rules, in order for a drug to be orally active in humans, it should conform to the following rules: (i) have ≤5 hydrogen bond donors, (ii) have ≤10 hydrogen bond acceptors, (iii) weigh <500 Da and (iv) have an octanol-water partition coefficient (log P) ≤5. The Veber's rule further requires that the chemical substance (v) contains ≤ 10 rotatable bonds and (vi) its polar surface area does not exceed 140 Ǻ2 (angstrom2). The final StS for each drug is a value within the range [0, 1] calculated in the following way:where “6” is the maximum number of structural rules that a drug might violate.

Development

The static components of the user interface (UI) of the CoDReS web-based application are developed in php, html, css (bootstrap) and javascript (ajax), while the dynamic components of the UI are refreshed via php and back-end R scripts. Several data-repositories have been downloaded, parsed and integrated into a MySQL database, which in turn serves the CoDReS web-based application. Information regarding the database releases, versions and links can be found on Table 1.

Table 1

Information regarding resources integrated to CoDReS.

Database Name	Link	File	Current Version	Last Update
BindingDB	https://www.bindingdb.org/bind/chemsearch/marvin/SDFdownload.jsp?all_download = yes	BindingDB_All_2019m1.tsv.zip	–	2019/02
CheMBL	ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/	chembl_24_1_mysql.tar.gz	24.1	2018/06
DGIdb	http://www.dgidb.org/downloads	Interactions TSV	3.0.2	2018/01
DisGeNET	http://www.disgenet.org/downloads	ALL gene-disease associations	6.0	2019/02
DrugBank	https://www.drugbank.ca/releases/latest#open-data	DrugBank Vocabulary	5.1.2	2018/12
	https://www.drugbank.ca/releases/latest#protein-identifiers	Drug Target Identifiers - All	5.1.2	2018/12
	https://www.drugbank.ca/releases/latest#structures	Structural External Links - All	5.1.2	2018/12
DrugCentral	http://drugcentral.org/download	Drug-target interaction	10.4	2018/08
HGNC	https://www.genenames.org/cgi-bin/download	Approved Symbol, Synonyms	–	2019/02
repoDB	http://apps.chiragjpgroup.org/repoDB/	full repoDB dataset	1.2	2017/07
Uniprot	ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism	HUMAN_9606_idmapping.dat	–	2019/02

Information regarding resources integrated to CoDReS. CoDReS works with drug synonyms that have been downloaded from DrugBank [10]. DrugBank is a drug-centric online database that provides detailed information on drugs and their gene targets. The rest of the databases that were used in CoDReS that include drug names have been parsed and have had their drug names translated to DrugBank's usual names. DrugBank identifiers were also assigned to each input drug where applicable or an “unassigned” value was given otherwise. At the gene level, CoDReS works on gene synonyms that derive from the HGNC [11] database and every other database that contains gene identifiers, is parsed and translated according to the HGNC gene synonyms. The backend of CoDReS is developed in R. The FS's first parameter, namely Confidence Score, is taken from DisGeNET [12] which is an online database linking genes to diseases by integrating information of various biological databases and giving a score to each interaction. The Ki value of a drug-protein pair is queried from BindingDB [13] which is another online database that contains data from experimentally validated binding affinities between proteins and ligands. To achieve the proper linking between the databases, we convert genes to proteins through Uniprot [14] and by querying BindingDB the proteins are linked to drug identifiers either from DrugBank or CheMBL [15]. If no Ki value is found for a drug-protein pair, the application uses the median Ki value (184) of BindingDB, instead of the average value (120523.41) which results from the database's outliers. Uniprot is an online knowledgebase that hosts annotated sequences of over 120 million proteins as well as provides protein visualization methods. CheMBL is another drug related database similar to DrugBank. The gene targets of the input drugs are found in the parsed DrugBank, DrugCentral [16] and DGIdb [17] databases. The StS of each drug is calculated via the Rcpi package of R and requires a drug's molecular structure as input. Each molecular structure is extracted by its respective simplified molecular-input line-entry system (SMILES) type, which is a specification in form of a line notation for describing the structure of chemical species using short ASCII strings. CoDReS tries to either map DrugBank identifiers or CheMBL drug names to SMILES. For every violated rule (aforementioned Lipinski and Veber rules), a drug receives a “plus 1” violation score, with the lowest possible score being six violations. In case there was no SMILES for a specific drug, the candidate drug is assigned a “zero” StS score acting in a conservative manner by adopting the worst-case scenario presenting the max number of violations. Another important aspect of the CoDReS tool, is that it highlights the highest ranked drugs of structural clusters, as exemplars, by applying an affinity propagation clustering via the R package APCluster [18] on the similarity matrix of the fingerprints of the input drugs. Specifically, the calcDrugFPSim function of the Rcpi package is used in order to calculate the similarity matrix with a compact E-State fragments fingerprint type and a tanimoto metric as arguments. The structural exemplars are presented as a good choice of disease inhibitors for further investigation, since different structural properties might target different biological mechanisms of a disease phenotype. Finally, if there are clinical trials carried out for a disease that have led to failed drugs against a disease, the structural similarity between these compounds and the input list of drugs is measured. For this purpose, the online dataset of repoDB [19] has been parsed, keeping the suspended, terminated and withdrawn drugs for each disease identifier. The execution pipeline together with all the integrated databases and packages are depicted in Fig. 2.

Fig. 2

CoDReS integration scheme.

User Execution

The user is required to upload a file containing the drug names and, optionally their respective a-priori scores, as might be acquired from a drug-repurposing tool. As soon as the input file is uploaded, a histogram and a distribution diagram of the input scores are generated (Fig. 3). The weights denoting the importance of the aS, FS and StS are user-selected and have equal default values. The user must then choose a disease from a select box with auto-complete functionality that hosts all DisGeNET diseases.

Fig. 3

Input score diagrams are drawn after the user uploads a drug list with their respective scores as returned by any drug repurposing tool.

Input score diagrams are drawn after the user uploads a drug list with their respective scores as returned by any drug repurposing tool. The output of the CoDReS function is then rendered in tabular form and can be sorted, printed and downloaded either as plain text, csv, spreadsheet or pdf file. The main CoDReS output table consists of the CoDReS rank, the initial position of the input drugs, their input names, their DrugBank usual names and identifiers (or input name again and “unassigned” identifier respectively, if not found in DrugBank's synonyms list), their normalized score per category and their normalized CoDReS, by which they are sorted in descending order (Fig. 4). A drug-score diagram for each scoring parameter is also printed at the bottom of the page after the execution (Fig. 5).

Fig. 4

Main CoDReS output matrix.

Fig. 5

Drug score diagrams for each scoring parameter.

Main CoDReS output matrix. Drug score diagrams for each scoring parameter. In case there are stated failed clinical trials in repoDB for the selected disease, a similarity matrix of all input drugs against the failed drugs is returned to the user, where the column names represent the failed drugs and the row names the input drugs (Fig. 6).

Fig. 6

Structural similarity of input drugs (rows) to clinically failed drugs of input disease (columns) as found in repoDB.

Results and Validation

To check the validity of the CoDReS results, we considered examples disregarding a-priori scores. We chose the top forty diseases from DisGeNET with the most correlated genes that have at least twenty drug candidates in Malacards [20]. These diseases are listed in Table 2. For each disease, we created a mixture list of two hundred drugs: 95% randomly selected from DrugBank and 5% of the top drugs reported from Malacards repository as developed/used for the selected disease. After executing CoDReS for each experiment, we counted the number of the actual disease-related drugs that were found in the top 5% of the ranked drugs, based on their CoDReS along with a p-value calculated through a hypergeometric distribution test. We repeated this procedure a hundred times for each disease and then calculated the median, maximum, minimum and average p-value metrics for each disease. CoDReS ranked effectively (median p-value <.05) the input drugs in 35/40 diseases. CoDReS failed to rank drugs correctly in five out of 40 diseases but this failure can be partially explained since the top ten drugs corresponding to most of these diseases contain abstract substances or generic categories such as “Anti-Inflammatory Agents”, “Cytochrome P-450 Enzyme Inhibitors”, “Immunologic Factors” or drugs with close to zero gene targets participating in the disease. The statistical results are presented in Table 3.

Table 2

Disease name	UMLS ID	Gene count	Malacards name
Malignant neoplasm of breast	C0006142	5053	Breast Cancer
Liver carcinoma	C2239176	3592	Hepatocellular Carcinoma
Colorectal Cancer	C1527249	3298	Colorectal Cancer
Malignant neoplasm of prostate	C0376358	3238	Prostate Cancer
Carcinoma of lung	C0684249	2475	Lung Cancer
melanoma	C0025202	2453	Melanoma
Malignant neoplasm of stomach	C0024623	2397	Gastric Cancer
Glioma	C0017638	2210	Glioma
Ovarian Carcinoma	C0029925	2202	Ovarian Cancer
Alzheimer's Disease	C0002395	1981	Alzheimer Disease
leukemia	C0023418	1940	Leukemia
Glioblastoma	C0017636	1936	Glioblastoma
Schizophrenia	C0036341	1922	Schizophrenia
Squamous cell carcinoma	C0007137	1875	Squamous Cell Carcinoma
Pancreatic carcinoma	C0235974	1868	Pancreatic Cancer
Rheumatoid Arthritis	C0003873	1832	Rheumatoid Arthritis
Adenocarcinoma	C0001418	1711	Adenocarcinoma
Leukemia, Myelocytic, Acute	C0023467	1702	Leukemia, Acute Myeloid
Neuroblastoma	C0027819	1698	Neuroblastoma
Diabetes Mellitus, Non-Insulin-Dependent	C0011860	1671	Diabetes Mellitus, Noninsulin-Dependent
Diabetes Mellitus	C0011849	1506	Diabetes Mellitus
Renal Cell Carcinoma	C0007134	1347	Renal Cell Carcinoma, Papillary, 1
Asthma	C0004096	1312	Asthma
Multiple Myeloma	C0026764	1311	Myeloma, Multiple
Hypertensive disease	C0020538	1309	Hypertension, Essential
Lymphoma	C0024299	1306	Lymphoma
Bladder Neoplasm	C0005695	1216	Bladder Cancer
Epilepsy	C0014544	1176	Epilepsy
Seizures	C0036572	1173	Seizure Disorder
Chronic Lymphocytic Leukemia	C0023434	1119	Leukemia, Chronic Lymphocytic
Lupus Erythematosus, Systemic	C0024141	1112	Systemic Lupus Erythematosus
Multiple Sclerosis	C0026769	1105	Multiple Sclerosis
Cervix carcinoma	C0302592	1104	Cervix carcinoma
Osteosarcoma	C0029463	1102	Osteogenic Sarcoma
Arteriosclerosis	C0003850	1086	Arteriosclerosis
Autoimmune Diseases	C0004364	1059	Autoimmune Disease
Osteosarcoma of bone	C0585442	1041	Bone Osteosarcoma
Squamous cell carcinoma of esophagus	C0279626	1022	Esophagus Squamous Cell Carcinoma
Adenoma	C0001430	999	Adenoma
Coronary Artery Disease	C1956346	980	Coronary Artery Anomaly

Table 3

The median, maximum, minimum and average p-value results of 100 codres executions for each disease as calculated by hypergeometric distribution tests. The median p-values that are above 0.05 are painted red.

information on the diseases used for the validation; the two first columns present the disease's name and umls id respectively as found in disgenet, the third column the total genes that participate in the disease and the fourth column the disease's name as returned from malacards. The median, maximum, minimum and average p-value results of 100 codres executions for each disease as calculated by hypergeometric distribution tests. The median p-values that are above 0.05 are painted red.

Discussion

In this article we present CoDReS, a drug (re-)ranking tool that can act as a tool for post filtering drug lists generated either by conventional drug repurposing tools or by any other drug discovery pipeline. CoDReS should be used as a means of suggesting the best candidates for in vitro or clinical studies by combining a priori knowledge with functional and structural information. The in silico validation schema of CoDReS, as presented in the previous paragraph, brought the disease-related drugs to the top of the random drug pool in almost every case. Despite the promising results, this schema is only a computational validation of the tool's capabilities. In the end, the scientists using the tool should always incorporate their knowledge, expertise and the bibliography in order to decide the best drug candidates for further experiments.

Declarations of Copmeting Interest

None declared.

20 in total

1. APCluster: an R package for affinity propagation clustering.

Authors: Ulrich Bodenhofer; Andreas Kothmeier; Sepp Hochreiter
Journal: Bioinformatics Date: 2011-07-06 Impact factor: 6.937

2. Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions.

Authors: Dong-Sheng Cao; Nan Xiao; Qing-Song Xu; Alex F Chen
Journal: Bioinformatics Date: 2014-09-22 Impact factor: 6.937

3. Molecular properties that influence the oral bioavailability of drug candidates.

Authors: Daniel F Veber; Stephen R Johnson; Hung-Yuan Cheng; Brian R Smith; Keith W Ward; Kenneth D Kopple
Journal: J Med Chem Date: 2002-06-06 Impact factor: 7.446

4. ChemMine tools: an online service for analyzing and clustering small molecules.

Authors: Tyler W H Backman; Yiqun Cao; Thomas Girke
Journal: Nucleic Acids Res Date: 2011-05-16 Impact factor: 16.971

5. DrugBank 5.0: a major update to the DrugBank database for 2018.

Authors: David S Wishart; Yannick D Feunang; An C Guo; Elvis J Lo; Ana Marcu; Jason R Grant; Tanvir Sajed; Daniel Johnson; Carin Li; Zinat Sayeeda; Nazanin Assempour; Ithayavani Iynkkaran; Yifeng Liu; Adam Maciejewski; Nicola Gale; Alex Wilson; Lucy Chin; Ryan Cummings; Diana Le; Allison Pon; Craig Knox; Michael Wilson
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

6. A standard database for drug repositioning.

Authors: Adam S Brown; Chirag J Patel
Journal: Sci Data Date: 2017-03-14 Impact factor: 6.444

7. Genenames.org: the HGNC and VGNC resources in 2017.

Authors: Bethan Yates; Bryony Braschi; Kristian A Gray; Ruth L Seal; Susan Tweedie; Elspeth A Bruford
Journal: Nucleic Acids Res Date: 2016-10-30 Impact factor: 16.971

8. The ChEMBL database in 2017.

Authors: Anna Gaulton; Anne Hersey; Michał Nowotka; A Patrícia Bento; Jon Chambers; David Mendez; Prudence Mutowo; Francis Atkinson; Louisa J Bellis; Elena Cibrián-Uhalte; Mark Davies; Nathan Dedman; Anneli Karlsson; María Paula Magariños; John P Overington; George Papadatos; Ines Smit; Andrew R Leach
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

9. Drug repurposing in idiopathic pulmonary fibrosis filtered by a bioinformatics-derived composite score.

Authors: E Karatzas; M M Bourdakou; G Kolios; G M Spyrou
Journal: Sci Rep Date: 2017-10-03 Impact factor: 4.379

10. DGIdb 3.0: a redesign and expansion of the drug-gene interaction database.

Authors: Kelsy C Cotto; Alex H Wagner; Yang-Yang Feng; Susanna Kiwala; Adam C Coffman; Gregory Spies; Alex Wollam; Nicholas C Spies; Obi L Griffith; Malachi Griffith
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

4 in total

1. Molecular Taxonomy of Systemic Lupus Erythematosus Through Data-Driven Patient Stratification: Molecular Endotypes and Cluster-Tailored Drugs.

Authors: Panagiotis Garantziotis; Dimitrios Nikolakis; Stavros Doumas; Eleni Frangou; George Sentis; Anastasia Filia; Antonis Fanouriakis; George Bertsias; Dimitrios T Boumpas
Journal: Front Immunol Date: 2022-05-09 Impact factor: 8.786

2. Multi-omics data integration and network-based analysis drives a multiplex drug repurposing approach to a shortlist of candidate drugs against COVID-19.

Authors: Marios Tomazou; Marilena M Bourdakou; George Minadakis; Margarita Zachariou; Anastasis Oulas; Evangelos Karatzas; Eleni M Loizidou; Andrea C Kakouri; Christiana C Christodoulou; Kyriaki Savva; Maria Zanti; Anna Onisiforou; Sotiroula Afxenti; Jan Richter; Christina G Christodoulou; Theodoros Kyprianou; George Kolios; Nikolas Dietis; George M Spyrou
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

3. Network-based stage-specific drug repurposing for Alzheimer's disease.

Authors: Kyriaki Savva; Margarita Zachariou; Marilena M Bourdakou; Nikolas Dietis; George M Spyrou
Journal: Comput Struct Biotechnol J Date: 2022-03-16 Impact factor: 7.271

4. Therapeutic Targeting of Repurposed Anticancer Drugs in Alzheimer's Disease: Using the Multiomics Approach.

Authors: Dia Advani; Pravir Kumar
Journal: ACS Omega Date: 2021-05-19

4 in total