Literature DB >> 31360332

A Web Tool for Ranking Candidate Drugs Against a Selected Disease Based on a Combination of Functional and Structural Criteria.

Evangelos Karatzas1,2, George Minadakis2,3, George Kolios4, Alex Delis1, George M Spyrou2,3.   

Abstract

Drug repurposing techniques allow existing drugs to be tested against diseases outside their initial spectrum, resulting in reduced cost and eliminating the long time-frames of new drug development. In silico drug repurposing further speeds up the process either by proposing drugs suitable to invert the transcriptomic profile of a disease or by indicating drugs based on their common targets or structural similarity with other drugs with similar mode of action. Such methods usually return a number of potential repurposed drugs that need to be tested against the disease in in vitro, pre-clinical and clinical studies. Thus, it is crucial to have a more sophisticated candidate drug ranking in order to start testing from the most promising chemical substances. As a means to enhance the above decision process, we present CoDReS (Composite Drug Reranking Scoring), a drug (re-)ranking web-based tool, which combines an initial drug ranking (i.e. repurposing score or hypothesis/potentiality score) with a functional score of each drug considered in conjunction with the disease under study as well as with a structural score derived from potential drugability violations. Furthermore, a structural similarity clustering is applied on the considered drugs and a handful of structural exemplars are suggested for further in vitro and in vivo validation. The user is able to filter the results further, through structural similarity examination of the candidate drugs with drugs that have failed against the queried disease where related clinical trials have been carried out. CoDReS is publicly available online at http://bioinformatics.cing.ac.cy/codres.

Entities:  

Keywords:  Cheminformatics; Data mining; Drug discovery; Drug ranking

Year:  2019        PMID: 31360332      PMCID: PMC6637175          DOI: 10.1016/j.csbj.2019.05.010

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   7.271


Introduction

Transcriptomic-based computational drug repurposing (DR) tools, such as Connectivity Map [1] and L1000CDS2 [2], compare a disease-related gene expression profile with a number of stored existing expression profiles corresponding to cellular responses against a number of perturbations. Existing tools return lists of candidate repurposed drugs, which can be ordered by their inhibition score. The inhibition score describes the potentiality of a chemical substance to alter the perturbed gene signature state of a disease back to its “normal-healthy” values. Although the inhibition score might give insight onto the potency of a drug against a disease, it alone cannot guarantee success in a clinical trial. On the other hand, cheminformatics tools, such as ChemMine Tools [3] and programming packages such as Rcpi [4] and ChemmineR [5] can suggest drugs with similar structure and possibly similar mode of action to drugs with a-priori knowledge regarding their effectiveness either against a specific disease-related mechanism or against diseases with phenotypic similarity to the targeted disease. However, the derived similarity score is often not enough to deem a drug an appropriate candidate against a disease. Other types of drug information are ought to be examined, like the candidate drug's functional relation to the disease and its binding affinity to any related-to-the-disease gene target as well as its drug-likeness evaluation based on structural rules that might categorize the drug inappropriate for clinical trials. In order to attain both the scoring implementation for these different drug aspects and provide a more meaningful ranking of the candidate repurposed drugs, we have developed the CoDReS (Composite Drug Reranking Score) web-based tool based on- and extending the initial methodology introduced in [6] in the following ways; CoDReS integrates information from updated biological databases, incorporates binding affinity scores between ligands and proteins, evaluates drug-likeness and presents structural similarities between input drugs and possible failed drugs that have already been tested against the queried disease in clinical trials. A summary figure of the CoDReS pipeline is depicted in Fig. 1.
Fig. 1

CoDReS summary figure.

CoDReS summary figure.

Tool Description

Scoring Scheme

A composite score (from here on referred to as CoDReS) is calculated, for each drug, as the normalized weighted sum of the initial a-priori score (aS) with a functional (FS) and a structural score (StS) as introduced below: The weights waS, wFS and wStS are user-defined parameters that determine the desired influence of each part (a-priori, functional and structural scores respectively) to the final score and have equal default values. The a-priori scores can be uploaded by the user and are automatically normalized in the unit interval [0, 1] by dividing with the absolute maximum a-priori score. The functional score requires the calculation of two different parameters: the Confidence Score, which reflects the gene-disease association and the Ki, which is an inhibitory constant, measured in nM, and represents the reciprocal of the binding affinity between the inhibitor (drug) and the enzyme (target) [7]. The smaller the Ki, the greater the binding affinity. The FS for each drug is calculated as the sum of the products of Confidence Score with the inverse value of Ki, for each gene target of the drug that has been related to the queried disease. Each drug's FS is finally normalized in [0, 1] by dividing with the maximum FS. The structural score calculates a substance's drug-likeness based on the Lipinski “rules of 5” [8] and Veber's rule [9]. According to the Lipinski rules, in order for a drug to be orally active in humans, it should conform to the following rules: (i) have ≤5 hydrogen bond donors, (ii) have ≤10 hydrogen bond acceptors, (iii) weigh <500 Da and (iv) have an octanol-water partition coefficient (log P) ≤5. The Veber's rule further requires that the chemical substance (v) contains ≤ 10 rotatable bonds and (vi) its polar surface area does not exceed 140 Ǻ2 (angstrom2). The final StS for each drug is a value within the range [0, 1] calculated in the following way:where “6” is the maximum number of structural rules that a drug might violate.

Development

The static components of the user interface (UI) of the CoDReS web-based application are developed in php, html, css (bootstrap) and javascript (ajax), while the dynamic components of the UI are refreshed via php and back-end R scripts. Several data-repositories have been downloaded, parsed and integrated into a MySQL database, which in turn serves the CoDReS web-based application. Information regarding the database releases, versions and links can be found on Table 1.
Table 1

Information regarding resources integrated to CoDReS.

Database NameLinkFileCurrent VersionLast Update
BindingDBhttps://www.bindingdb.org/bind/chemsearch/marvin/SDFdownload.jsp?all_download=yesBindingDB_All_2019m1.tsv.zip2019/02
CheMBLftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_24_1_mysql.tar.gz24.12018/06
DGIdbhttp://www.dgidb.org/downloadsInteractions TSV3.0.22018/01
DisGeNEThttp://www.disgenet.org/downloadsALL gene-disease associations6.02019/02
DrugBankhttps://www.drugbank.ca/releases/latest#open-dataDrugBank Vocabulary5.1.22018/12
https://www.drugbank.ca/releases/latest#protein-identifiersDrug Target Identifiers - All5.1.22018/12
https://www.drugbank.ca/releases/latest#structuresStructural External Links - All5.1.22018/12
DrugCentralhttp://drugcentral.org/downloadDrug-target interaction10.42018/08
HGNChttps://www.genenames.org/cgi-bin/downloadApproved Symbol, Synonyms2019/02
repoDBhttp://apps.chiragjpgroup.org/repoDB/full repoDB dataset1.22017/07
Uniprotftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organismHUMAN_9606_idmapping.dat2019/02
Information regarding resources integrated to CoDReS. CoDReS works with drug synonyms that have been downloaded from DrugBank [10]. DrugBank is a drug-centric online database that provides detailed information on drugs and their gene targets. The rest of the databases that were used in CoDReS that include drug names have been parsed and have had their drug names translated to DrugBank's usual names. DrugBank identifiers were also assigned to each input drug where applicable or an “unassigned” value was given otherwise. At the gene level, CoDReS works on gene synonyms that derive from the HGNC [11] database and every other database that contains gene identifiers, is parsed and translated according to the HGNC gene synonyms. The backend of CoDReS is developed in R. The FS's first parameter, namely Confidence Score, is taken from DisGeNET [12] which is an online database linking genes to diseases by integrating information of various biological databases and giving a score to each interaction. The Ki value of a drug-protein pair is queried from BindingDB [13] which is another online database that contains data from experimentally validated binding affinities between proteins and ligands. To achieve the proper linking between the databases, we convert genes to proteins through Uniprot [14] and by querying BindingDB the proteins are linked to drug identifiers either from DrugBank or CheMBL [15]. If no Ki value is found for a drug-protein pair, the application uses the median Ki value (184) of BindingDB, instead of the average value (120523.41) which results from the database's outliers. Uniprot is an online knowledgebase that hosts annotated sequences of over 120 million proteins as well as provides protein visualization methods. CheMBL is another drug related database similar to DrugBank. The gene targets of the input drugs are found in the parsed DrugBank, DrugCentral [16] and DGIdb [17] databases. The StS of each drug is calculated via the Rcpi package of R and requires a drug's molecular structure as input. Each molecular structure is extracted by its respective simplified molecular-input line-entry system (SMILES) type, which is a specification in form of a line notation for describing the structure of chemical species using short ASCII strings. CoDReS tries to either map DrugBank identifiers or CheMBL drug names to SMILES. For every violated rule (aforementioned Lipinski and Veber rules), a drug receives a “plus 1” violation score, with the lowest possible score being six violations. In case there was no SMILES for a specific drug, the candidate drug is assigned a “zero” StS score acting in a conservative manner by adopting the worst-case scenario presenting the max number of violations. Another important aspect of the CoDReS tool, is that it highlights the highest ranked drugs of structural clusters, as exemplars, by applying an affinity propagation clustering via the R package APCluster [18] on the similarity matrix of the fingerprints of the input drugs. Specifically, the calcDrugFPSim function of the Rcpi package is used in order to calculate the similarity matrix with a compact E-State fragments fingerprint type and a tanimoto metric as arguments. The structural exemplars are presented as a good choice of disease inhibitors for further investigation, since different structural properties might target different biological mechanisms of a disease phenotype. Finally, if there are clinical trials carried out for a disease that have led to failed drugs against a disease, the structural similarity between these compounds and the input list of drugs is measured. For this purpose, the online dataset of repoDB [19] has been parsed, keeping the suspended, terminated and withdrawn drugs for each disease identifier. The execution pipeline together with all the integrated databases and packages are depicted in Fig. 2.
Fig. 2

CoDReS integration scheme.

CoDReS integration scheme.

User Execution

The user is required to upload a file containing the drug names and, optionally their respective a-priori scores, as might be acquired from a drug-repurposing tool. As soon as the input file is uploaded, a histogram and a distribution diagram of the input scores are generated (Fig. 3). The weights denoting the importance of the aS, FS and StS are user-selected and have equal default values. The user must then choose a disease from a select box with auto-complete functionality that hosts all DisGeNET diseases.
Fig. 3

Input score diagrams are drawn after the user uploads a drug list with their respective scores as returned by any drug repurposing tool.

Input score diagrams are drawn after the user uploads a drug list with their respective scores as returned by any drug repurposing tool. The output of the CoDReS function is then rendered in tabular form and can be sorted, printed and downloaded either as plain text, csv, spreadsheet or pdf file. The main CoDReS output table consists of the CoDReS rank, the initial position of the input drugs, their input names, their DrugBank usual names and identifiers (or input name again and “unassigned” identifier respectively, if not found in DrugBank's synonyms list), their normalized score per category and their normalized CoDReS, by which they are sorted in descending order (Fig. 4). A drug-score diagram for each scoring parameter is also printed at the bottom of the page after the execution (Fig. 5).
Fig. 4

Main CoDReS output matrix.

Fig. 5

Drug score diagrams for each scoring parameter.

Main CoDReS output matrix. Drug score diagrams for each scoring parameter. In case there are stated failed clinical trials in repoDB for the selected disease, a similarity matrix of all input drugs against the failed drugs is returned to the user, where the column names represent the failed drugs and the row names the input drugs (Fig. 6).
Fig. 6

Structural similarity of input drugs (rows) to clinically failed drugs of input disease (columns) as found in repoDB.

Structural similarity of input drugs (rows) to clinically failed drugs of input disease (columns) as found in repoDB.

Results and Validation

To check the validity of the CoDReS results, we considered examples disregarding a-priori scores. We chose the top forty diseases from DisGeNET with the most correlated genes that have at least twenty drug candidates in Malacards [20]. These diseases are listed in Table 2. For each disease, we created a mixture list of two hundred drugs: 95% randomly selected from DrugBank and 5% of the top drugs reported from Malacards repository as developed/used for the selected disease. After executing CoDReS for each experiment, we counted the number of the actual disease-related drugs that were found in the top 5% of the ranked drugs, based on their CoDReS along with a p-value calculated through a hypergeometric distribution test. We repeated this procedure a hundred times for each disease and then calculated the median, maximum, minimum and average p-value metrics for each disease. CoDReS ranked effectively (median p-value <.05) the input drugs in 35/40 diseases. CoDReS failed to rank drugs correctly in five out of 40 diseases but this failure can be partially explained since the top ten drugs corresponding to most of these diseases contain abstract substances or generic categories such as “Anti-Inflammatory Agents”, “Cytochrome P-450 Enzyme Inhibitors”, “Immunologic Factors” or drugs with close to zero gene targets participating in the disease. The statistical results are presented in Table 3.
Table 2

information on the diseases used for the validation; the two first columns present the disease's name and umls id respectively as found in disgenet, the third column the total genes that participate in the disease and the fourth column the disease's name as returned from malacards.

Disease nameUMLS IDGene countMalacards name
Malignant neoplasm of breastC00061425053Breast Cancer
Liver carcinomaC22391763592Hepatocellular Carcinoma
Colorectal CancerC15272493298Colorectal Cancer
Malignant neoplasm of prostateC03763583238Prostate Cancer
Carcinoma of lungC06842492475Lung Cancer
melanomaC00252022453Melanoma
Malignant neoplasm of stomachC00246232397Gastric Cancer
GliomaC00176382210Glioma
Ovarian CarcinomaC00299252202Ovarian Cancer
Alzheimer's DiseaseC00023951981Alzheimer Disease
leukemiaC00234181940Leukemia
GlioblastomaC00176361936Glioblastoma
SchizophreniaC00363411922Schizophrenia
Squamous cell carcinomaC00071371875Squamous Cell Carcinoma
Pancreatic carcinomaC02359741868Pancreatic Cancer
Rheumatoid ArthritisC00038731832Rheumatoid Arthritis
AdenocarcinomaC00014181711Adenocarcinoma
Leukemia, Myelocytic, AcuteC00234671702Leukemia, Acute Myeloid
NeuroblastomaC00278191698Neuroblastoma
Diabetes Mellitus, Non-Insulin-DependentC00118601671Diabetes Mellitus, Noninsulin-Dependent
Diabetes MellitusC00118491506Diabetes Mellitus
Renal Cell CarcinomaC00071341347Renal Cell Carcinoma, Papillary, 1
AsthmaC00040961312Asthma
Multiple MyelomaC00267641311Myeloma, Multiple
Hypertensive diseaseC00205381309Hypertension, Essential
LymphomaC00242991306Lymphoma
Bladder NeoplasmC00056951216Bladder Cancer
EpilepsyC00145441176Epilepsy
SeizuresC00365721173Seizure Disorder
Chronic Lymphocytic LeukemiaC00234341119Leukemia, Chronic Lymphocytic
Lupus Erythematosus, SystemicC00241411112Systemic Lupus Erythematosus
Multiple SclerosisC00267691105Multiple Sclerosis
Cervix carcinomaC03025921104Cervix carcinoma
OsteosarcomaC00294631102Osteogenic Sarcoma
ArteriosclerosisC00038501086Arteriosclerosis
Autoimmune DiseasesC00043641059Autoimmune Disease
Osteosarcoma of boneC05854421041Bone Osteosarcoma
Squamous cell carcinoma of esophagusC02796261022Esophagus Squamous Cell Carcinoma
AdenomaC0001430999Adenoma
Coronary Artery DiseaseC1956346980Coronary Artery Anomaly
Table 3

The median, maximum, minimum and average p-value results of 100 codres executions for each disease as calculated by hypergeometric distribution tests. The median p-values that are above 0.05 are painted red.

information on the diseases used for the validation; the two first columns present the disease's name and umls id respectively as found in disgenet, the third column the total genes that participate in the disease and the fourth column the disease's name as returned from malacards. The median, maximum, minimum and average p-value results of 100 codres executions for each disease as calculated by hypergeometric distribution tests. The median p-values that are above 0.05 are painted red.

Discussion

In this article we present CoDReS, a drug (re-)ranking tool that can act as a tool for post filtering drug lists generated either by conventional drug repurposing tools or by any other drug discovery pipeline. CoDReS should be used as a means of suggesting the best candidates for in vitro or clinical studies by combining a priori knowledge with functional and structural information. The in silico validation schema of CoDReS, as presented in the previous paragraph, brought the disease-related drugs to the top of the random drug pool in almost every case. Despite the promising results, this schema is only a computational validation of the tool's capabilities. In the end, the scientists using the tool should always incorporate their knowledge, expertise and the bibliography in order to decide the best drug candidates for further experiments.

Declarations of Copmeting Interest

None declared.
  20 in total

1.  APCluster: an R package for affinity propagation clustering.

Authors:  Ulrich Bodenhofer; Andreas Kothmeier; Sepp Hochreiter
Journal:  Bioinformatics       Date:  2011-07-06       Impact factor: 6.937

2.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions.

Authors:  Dong-Sheng Cao; Nan Xiao; Qing-Song Xu; Alex F Chen
Journal:  Bioinformatics       Date:  2014-09-22       Impact factor: 6.937

3.  Molecular properties that influence the oral bioavailability of drug candidates.

Authors:  Daniel F Veber; Stephen R Johnson; Hung-Yuan Cheng; Brian R Smith; Keith W Ward; Kenneth D Kopple
Journal:  J Med Chem       Date:  2002-06-06       Impact factor: 7.446

4.  ChemMine tools: an online service for analyzing and clustering small molecules.

Authors:  Tyler W H Backman; Yiqun Cao; Thomas Girke
Journal:  Nucleic Acids Res       Date:  2011-05-16       Impact factor: 16.971

5.  DrugBank 5.0: a major update to the DrugBank database for 2018.

Authors:  David S Wishart; Yannick D Feunang; An C Guo; Elvis J Lo; Ana Marcu; Jason R Grant; Tanvir Sajed; Daniel Johnson; Carin Li; Zinat Sayeeda; Nazanin Assempour; Ithayavani Iynkkaran; Yifeng Liu; Adam Maciejewski; Nicola Gale; Alex Wilson; Lucy Chin; Ryan Cummings; Diana Le; Allison Pon; Craig Knox; Michael Wilson
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

6.  A standard database for drug repositioning.

Authors:  Adam S Brown; Chirag J Patel
Journal:  Sci Data       Date:  2017-03-14       Impact factor: 6.444

7.  Genenames.org: the HGNC and VGNC resources in 2017.

Authors:  Bethan Yates; Bryony Braschi; Kristian A Gray; Ruth L Seal; Susan Tweedie; Elspeth A Bruford
Journal:  Nucleic Acids Res       Date:  2016-10-30       Impact factor: 16.971

8.  The ChEMBL database in 2017.

Authors:  Anna Gaulton; Anne Hersey; Michał Nowotka; A Patrícia Bento; Jon Chambers; David Mendez; Prudence Mutowo; Francis Atkinson; Louisa J Bellis; Elena Cibrián-Uhalte; Mark Davies; Nathan Dedman; Anneli Karlsson; María Paula Magariños; John P Overington; George Papadatos; Ines Smit; Andrew R Leach
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

9.  Drug repurposing in idiopathic pulmonary fibrosis filtered by a bioinformatics-derived composite score.

Authors:  E Karatzas; M M Bourdakou; G Kolios; G M Spyrou
Journal:  Sci Rep       Date:  2017-10-03       Impact factor: 4.379

10.  DGIdb 3.0: a redesign and expansion of the drug-gene interaction database.

Authors:  Kelsy C Cotto; Alex H Wagner; Yang-Yang Feng; Susanna Kiwala; Adam C Coffman; Gregory Spies; Alex Wollam; Nicholas C Spies; Obi L Griffith; Malachi Griffith
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

View more
  4 in total

1.  Molecular Taxonomy of Systemic Lupus Erythematosus Through Data-Driven Patient Stratification: Molecular Endotypes and Cluster-Tailored Drugs.

Authors:  Panagiotis Garantziotis; Dimitrios Nikolakis; Stavros Doumas; Eleni Frangou; George Sentis; Anastasia Filia; Antonis Fanouriakis; George Bertsias; Dimitrios T Boumpas
Journal:  Front Immunol       Date:  2022-05-09       Impact factor: 8.786

2.  Multi-omics data integration and network-based analysis drives a multiplex drug repurposing approach to a shortlist of candidate drugs against COVID-19.

Authors:  Marios Tomazou; Marilena M Bourdakou; George Minadakis; Margarita Zachariou; Anastasis Oulas; Evangelos Karatzas; Eleni M Loizidou; Andrea C Kakouri; Christiana C Christodoulou; Kyriaki Savva; Maria Zanti; Anna Onisiforou; Sotiroula Afxenti; Jan Richter; Christina G Christodoulou; Theodoros Kyprianou; George Kolios; Nikolas Dietis; George M Spyrou
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

3.  Network-based stage-specific drug repurposing for Alzheimer's disease.

Authors:  Kyriaki Savva; Margarita Zachariou; Marilena M Bourdakou; Nikolas Dietis; George M Spyrou
Journal:  Comput Struct Biotechnol J       Date:  2022-03-16       Impact factor: 7.271

4.  Therapeutic Targeting of Repurposed Anticancer Drugs in Alzheimer's Disease: Using the Multiomics Approach.

Authors:  Dia Advani; Pravir Kumar
Journal:  ACS Omega       Date:  2021-05-19
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.