Literature DB >> 18499712

SuperPred: drug classification and target prediction.

Mathias Dunkel¹, Stefan Günther, Jessica Ahmed, Burghardt Wittig, Robert Preissner.

Abstract

UNLABELLED: The drug classification scheme of the World Health Organization (WHO) [Anatomical Therapeutic Chemical (ATC)-code] connects chemical classification and therapeutic approach. It is generally accepted that compounds with similar physicochemical properties exhibit similar biological activity. If this hypothesis holds true for drugs, then the ATC-code, the putative medical indication area and potentially the medical target should be predictable on the basis of structural similarity. We have validated that the prediction of the drug class is reliable for WHO-classified drugs. The reliability of the predicted medical effects of the compounds increases with a rising number of (physico-) chemical properties similar to a drug with known function. The web-server translates a user-defined molecule into a structural fingerprint that is compared to about 6300 drugs, which are enriched by 7300 links to molecular targets of the drugs, derived through text mining followed by manual curation. Links to the affected pathways are provided. The similarity to the medical compounds is expressed by the Tanimoto coefficient that gives the structural similarity of two compounds. A similarity score higher than 0.85 results in correct ATC prediction for 81% of all cases. As the biological effect is well predictable, if the structural similarity is sufficient, the web-server allows prognoses about the medical indication area of novel compounds and to find new leads for known targets. AVAILABILITY: the system is freely accessible at http://bioinformatics.charite.de/superpred. SuperPred can be obtained via a Creative Commons Attribution Noncommercial-Share Alike 3.0 License.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2008 PMID： 18499712 PMCID： PMC2447784 DOI： 10.1093/nar/gkn307

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The accessibility of large compound databases has changed from exclusive inhouse databases of large pharmaceutical companies to publicly available sources (1). At this time several million different compounds can be obtained from different vendors (2). About 7000 drugs currently exist and there are about 480 validated targets that are addressed (3). There are estimations about the number of medical targets between 2200 and 3000 that favour interactions with drug-like chemical compounds (4). To map these medical targets onto medical indication areas a classification scheme is needed. Currently, the most commonly used classification system for drugs is the Anatomical Therapeutic Chemical (ATC) classification system. This scheme is recommended by the World Health Organization (WHO) for all global drug utilization studies and categorizes drug substances at different levels according to application area, therapeutic properties, chemical and pharmacological properties (5). A challenging aim is the mapping of the available compounds onto about 850 ATC-classes. The progress in understanding the mechanisms of action of a vast majority of drugs gives the opportunity to narrow down the gap between the medical indications and elucidation of drug effects at the molecular level. The relation between the structure of a compound and its biological activity was well investigated in some systematic analyses (6–8). It could be shown that a Tanimoto coefficient of >0.85 indicates that two molecules have similar activities (8). Based on this principle, it should be possible to predict medical indication areas for unclassified chemical compounds in case of sufficient structural similarity. A method based on the similar property principle (9) for predicting activity spectra of substances was described by Lagunin (10) and confirmed by several experiments (11,12). The PASS application is available at http://www.ibmc.msk.ru/PASS. Furthermore, new medical indication areas for approved drugs or drug candidates can be found by applying this rule. Indeed recently, much efforts have been put into drug repositioning (13,14). To discriminate between drugs and nondrugs, the use of property distributions and (physico-) chemical descriptors is already used successfully (15,16). The increased knowledge about drug-target-pathway relations and the integration of molecular similarity with property distribution allow improved structure–function prediction. Here, we present a publicly available web-server to predict medical indication areas based on properties and similarity of chemical compounds.

METHODS

Data set for the web-server

The web-server called SuperPred was created for recognition of as many drug classes as possible. For this reason, the number of medical compounds was enlarged to about 6300. The calculated fingerprints from the 2500 compounds of the SuperDrug database were used for a further structural screening against the SuperTarget database (17). In this way, 3800 additional compounds were detected that are structurally very similar to drugs and resulted in Tanimoto coefficients of at least 0.85. These putative drugs are most likely candidates for having the same mode of action, binding to the same target/enzyme and being assigned to the same medical indication as the WHO-classified drugs. In order to allow the examination of the drug effect on a molecular level, information about the target proteins was extracted from literature and was provided for half of the drugs (17).

Reduced data set for prediction evaluation

For the purpose of statistical evaluation of the prediction accuracy, a subset consisting of 1035 drugs was utilized. The members of the subset were chosen according to the following rules: First, every drug having more than one indication was removed; then each included ATC-group had to consist of at least 3 molecules and last, to eliminate outliers, the drugs within one ATC-group were not allowed to deviate more than 1.5-fold from the average Tanimoto score of the group. Furthermore, ATC-codes with very similar indications were organized into ATC-classes. For instance, ‘corticosteroids, moderately potent’ (ATC: D07AB), ‘corticosteroids, potent’ (ATC: D07AC), ‘corticosteroids, very potent’ (ATC: D07AD) and ‘corticosteroids, plain’ (ATC: S01BA) were combined to form the ATC-class ‘corticosteroids’ (see website and Supplemental material).

Prediction

The prediction was carried out by the combination of physicochemical property analyses and similarity searching. The prediction of the ATC-class was performed by the assignment of the compound to the ATC-class of the most similar drug and property distribution. The prediction accuracy was determined by the leave-one-out cross-validation method.

Physicochemical properties

Lipinski's Rule of Five (18) is a general accepted standard for oral applicable drugs. The rule describes molecular properties important for a drug's absorption, distribution, metabolism and excretion in the human body. It is stated that an orally active drug has not more than 5 hydrogen-bond donors, not more than 10 hydrogen-bond acceptors, a molecular weight below 500 g/mol and a logP less than 5. These properties and several more were calculated for each drug in SuperPred. The distributions of the properties’ values were saved in the database for each ATC-group and -class. In this way, the range of property values is comparable with the query.

Similarity searches

To calculate the similarity between two compounds, their structural fingerprints, generated by Chemistry Development ToolKit (CDK) (http://almost.cubic.uni-koeln.de/cdk/), were used. Structural fingerprints are bit-vectors encoding for the chemical and topological features of small molecules. The similarity is determined by the Tanimoto coefficient (19): where Na is the number of bits set to 1 in compound a, Nb is the number of bits set to 1 in compound b and Nab is the number of bits common to both, compounds a and b.

Input and output options

There are three ways to start a query with a molecule not included in the database: Enter SMILES (Simplified Molecular Input Line Entry System) Draw a molecule using Marvin Sketch Upload a MOL file using Marvin Sketch Medical compounds can be retrieved through an expandable ATC-tree, by name, synonym, ATC-code or via known target (name, Uniprot-ID). The output is a structured table, listing predictions (ATC-codes including confidence interval) containing similarity scores, compound-IDs, molecular structure visualized by Marvin View, target information and physicochemical property intervals of the ATC-group and of the query compound. The score and the color indicate the power of the prediction visualized.

RESULTS

Prediction results

The prediction accuracy is determined by the fraction of correct ATC-class predictions and amounts 67.6%. The distribution of the fractions of correctly predicted indications is shown in Table 1. For a Tanimoto coefficient >0.85 an accuracy of 80.6% is accomplished. A cumulative recall graph is shown in Figure 1. The graph shows the fraction of right predictions of ATC-classes in dependency of the quantity of retrieved structures. By retrieving three molecules the recall gains to about 80% and with 20 retrieved molecules a recall of about 90% is achieved.

Table 1.

Distribution of the fractions of correctly predicted indications

Range of Tanimoto coefficient	Numbers of hits/misses	Fraction of hits
0.4–0.5	5/18	21.7
0.5–0.6	18/27	40.0
0.6–0.7	40/60	40.0
0.7–0.8	93/84	52.5
0.8–0.9	171/58	74.7
0.9–1.0	367/79	82.3
0.0–1.0	700/335	67.6

For the reduced data set of 1035 drugs, 700 right and 335 wrong predictions are investigated. In detail: a similarity score of 90–100% specifies the correct ATC-class in about 82% (367 right and 79 wrong predictions). A hit/miss-rate of about 3/1 is achieved for similarity scores of 70% and higher.

Figure 1.

Cumulative recall for ATC-recognition relative to rank of retrieval.

Cumulative recall for ATC-recognition relative to rank of retrieval. Distribution of the fractions of correctly predicted indications For the reduced data set of 1035 drugs, 700 right and 335 wrong predictions are investigated. In detail: a similarity score of 90–100% specifies the correct ATC-class in about 82% (367 right and 79 wrong predictions). A hit/miss-rate of about 3/1 is achieved for similarity scores of 70% and higher. For the reduced data set of 1035 drugs, the recall is cumulated for one retrieved drug up to twenty retrieved drugs. With three retrieved molecules the recall gains to about 80% and with 20 retrieved molecules a cumulative recall of about 90% is achieved.

Case study

Besides leave-one-out cross-validation statistics, the prediction method was proved by a number of compounds extracted from the SuperTarget database as well as compounds experimentally tested against tumor-cell line assays. Starting point for the first screening was Enalapril, an ACE-inhibitor, that is used in treatment of hypertension and congestive heart failure. SuperPred identifies six putative drugs having a sufficient similarity to Enalapril indicated by a green color in the result table (Tanimoto coefficient >0.8). An inspection of the referenced literature via the Pubchem-database denoted a similar medical effect for all of them. Table 2 shows exemplarily the names of three of the six putative compounds and the associated reference that describes the medical effect of inhibiting the angiotensin-converting enzyme.

Table 2.

Compounds identified with SuperPred and similar to Enalapril and NSC 600221, respectively

Name of the compound	Tanimoto coefficient	Medical function	Target protein	Reference
Enalapril	100.00	ACE-inhibitor	Angiotensin-converting enzyme	(22)
Sch 31846^a	94.57	ACE-inhibitor (predicted)	Angiotensin-converting enzyme (predicted)	(23)
Delapril hydrochloride	83.84	ACE-inhibitor (predicted)	Angiotensin-converting enzyme (predicted)	(24)
Hoe 065^b	81.90	ACE-inhibitor/increasing central cholinergic activity (predicted)	Angiotensin-converting enzyme (predicted)	(25)
NSC 600221^c	100.00	Antineoplastic agent	Tubulin (predicted)	http://dtp.nci.nih.gov
Paclitaxel	91.62	Antineoplastic agent	Tubulin beta-1chain	(26)

a(2S,3aS,7aS)-1-((S)-N-((S)-1-Carboxy-3-phenylpropyl)alanyl) hexahydro-2 indolinecarboxylic acid, 1-ethyl ester, monohydrochloride.

bCyclopenta(c)pyrrole-1-carboxylic acid, 2-(2-((1-(ethoxycarbonyl)-3-phenylpropyl)amino)-1-oxopropyl)octahydro-, octyl ester, (1S-(1-alpha,2-(R*(R*)),3a-beta,6a-alpha))-, (Z)-2-butenedioate (1:1).

cBeta-Phenylalanine, N-benzoyl-2-[[(2-carboxyethyl) carbonyl]oxy]-, 6,12b-diacetoxy-12-(benzoyloxy)-2a,3,3a,4,5,6,9, 10,11,12,12a,12b-dodecahydro-4,11- dihydroxy-4a,8,13, 13-tetramethyl-5-oxo-7,11-methano- 1H-cyclodeca[3,4]benz[1, 2-b]oxet-9-yl ester.

Compounds identified with SuperPred and similar to Enalapril and NSC 600221, respectively a(2S,3aS,7aS)-1-((S)-N-((S)-1-Carboxy-3-phenylpropyl)alanyl) hexahydro-2 indolinecarboxylic acid, 1-ethyl ester, monohydrochloride. bCyclopenta(c)pyrrole-1-carboxylic acid, 2-(2-((1-(ethoxycarbonyl)-3-phenylpropyl)amino)-1-oxopropyl)octahydro-, octyl ester, (1S-(1-alpha,2-(R*(R*)),3a-beta,6a-alpha))-, (Z)-2-butenedioate (1:1). cBeta-Phenylalanine, N-benzoyl-2-[[(2-carboxyethyl) carbonyl]oxy]-, 6,12b-diacetoxy-12-(benzoyloxy)-2a,3,3a,4,5,6,9, 10,11,12,12a,12b-dodecahydro-4,11- dihydroxy-4a,8,13, 13-tetramethyl-5-oxo-7,11-methano- 1H-cyclodeca[3,4]benz[1, 2-b]oxet-9-yl ester. The National Cancer Institute Developmental Therapeutics Program (DTP) has screened about 100 000 compounds against a panel of 60 human tumor-cell lines. The results are available on the DTP web site (http://dtp.nci.nih.gov/). The growth inhibition (GI50) and lethal dose (LD50) of the compounds are also retrievable. Application: NCI-compound (NSC: 600221) is a screening hit with unverified target. This compound shares a Tanimoto coefficient of 0.92 with the compound Paclitaxel and therefore, it is predicted to be an antineoplastic agent targeting Tubulin. Enalapril is a well-known ACE-inhibitor. The compound Sch31846 has a similarity of 95% and is supposed to be an ACE-inhibitor, too. Isolated compounds of the comprehensive tumor-related information resource of the NCI were extracted and screened against the approved drugs included in SuperPred. Many of the screening candidates were characterized by a high physicochemical similarity to well-annotated anti-cancer drugs. For instance, the compound NSC 600221 (Table 2) and the antineoplastic agent Paclitaxel hold a Tanimoto coefficient of 0.91. Both compounds are shown in Figure 2. To analyze the ability to inhibit the proliferation of cancer cells, the GI50-values of both compounds were analyzed with COMPARE, a web accessible tool for investigating mechanisms of cell GI (20). The ability to inhibit the growth of the diverse set of cell lines was highly similar and was indicated by a correlation coefficient of 0.87 calculated by COMPARE. The high correlation coefficient even allowed predictions about the target protein of NSC 600221 (21). As Paclitaxel inhibits microtubule formation by binding to tubulin, the same target came into question for NSC 600221.

Figure 2.

Assembly of the SuperPred server and possible requests for ATC-code prediction. Data: the SuperPred server now contains 2500 compounds of the SuperDrug database. Additionally, 3800 experimental drugs were classified and stored on the server. The drugs are annotated by 7300 links to targets. Methods: the structural properties of the compounds are stored in so-called structural fingerprints, where each bit encodes for an element of the compound structure. The similarity of two compounds is calculated by using the Tanimoto coefficient. Moreover, physicochemical properties are stored for each compound. SuperPred can be used to find new targets for ligands and vice versa to find new ligands for medical biological targets. There are two possibilities to use the SuperPred server. The figure shows two examples for querying the SuperPred server.

CONCLUSION

The SuperPred web-server was created for predicting medical indications for chemical compounds. The combination of physicochemical property and similarity searching provides the possibility to detect new biologically active compounds and novel targets for drug-like compounds. SuperPred can be applied for drug repositioning purposes, too. A further intention of SuperPred is to find side effects elicited by drugs caused through off-target hits. The use of the web-server is free for all academics.

25 in total

1. Do structurally similar molecules have similar biological activity?

Authors: Yvonne C Martin; James L Kofron; Linda M Traphagen
Journal: J Med Chem Date: 2002-09-12 Impact factor: 7.446

2. The druggable genome: an update.

Authors: Andreas P Russ; Stefan Lampel
Journal: Drug Discov Today Date: 2005-12 Impact factor: 7.851

Review 3. Finding new tricks for old drugs: an efficient route for public-sector drug discovery.

Authors: Kerry A O'Connor; Bryan L Roth
Journal: Nat Rev Drug Discov Date: 2005-12 Impact factor: 84.694

4. The selection and use of essential medicines. Report of the WHO expert committee, 2005 (including the 14th model list of essential medicines).

Authors:
Journal: World Health Organ Tech Rep Ser Date: 2006

5. The role of innovation in drug development.

Authors: J Drews; S Ryser
Journal: Nat Biotechnol Date: 1997-12 Impact factor: 54.908

6. Assessing the ability of chemical similarity measures to discriminate between active and inactive compounds.

Authors: J S Delaney
Journal: Mol Divers Date: 1996-08 Impact factor: 2.943

7. Characterization of the Taxol binding site on the microtubule. Identification of Arg(282) in beta-tubulin as the site of photoincorporation of a 7-benzophenone analogue of Taxol.

Authors: S Rao; L He; S Chakravarty; I Ojima; G A Orr; S B Horwitz
Journal: J Biol Chem Date: 1999-12-31 Impact factor: 5.157

8. Effect on the development of ankle edema of adding delapril to manidipine in patients with mild to moderate essential hypertension: a three-way crossover study.

Authors: Roberto Fogari; GianDomenico Malamani; Annalisa Zoppi; Amedeo Mugellini; Andrea Rinaldi; Elena Fogari; Tiziano Perrone
Journal: Clin Ther Date: 2007-03 Impact factor: 3.393

9. Addition of eplerenone to an angiotensin-converting enzyme inhibitor effectively improves nitric oxide bioavailability.

Authors: Toshio Imanishi; Hideyuki Ikejima; Hiroto Tsujioka; Akio Kuroi; Katsunobu Kobayashi; Yasuteru Muragaki; Seiichi Mochizuki; Masami Goto; Kiyoshi Yoshida; Takashi Akasaka
Journal: Hypertension Date: 2008-01-28 Impact factor: 10.190

10. Computer-aided discovery of anti-inflammatory thiazolidinones with dual cyclooxygenase/lipoxygenase inhibition.

Authors: Athina A Geronikaki; Alexey A Lagunin; Dimitra I Hadjipavlou-Litina; Phaedra T Eleftheriou; Dmitrii A Filimonov; Vladimir V Poroikov; Intekhab Alam; Anil K Saxena
Journal: J Med Chem Date: 2008-02-27 Impact factor: 7.446

41 in total

1. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus.

Authors: Daniel Reker; Tiago Rodrigues; Petra Schneider; Gisbert Schneider
Journal: Proc Natl Acad Sci U S A Date: 2014-03-03 Impact factor: 11.205

2. The role of drug profiles as similarity metrics: applications to repurposing, adverse effects detection and drug-drug interactions.

Authors: Santiago Vilar; George Hripcsak
Journal: Brief Bioinform Date: 2017-07-01 Impact factor: 11.622