| Literature DB >> 27895719 |
Othman Soufan1, Wail Ba-Alawi1, Moataz Afeef1, Magbubah Essack1, Panos Kalnis2, Vladimir B Bajic1.
Abstract
BACKGROUND: Mining high-throughput screening (HTS) assays is key for enhancing decisions in the area of drug repositioning and drug discovery. However, many challenges are encountered in the process of developing suitable and accurate methods for extracting useful information from these assays. Virtual screening and a wide variety of databases, methods and solutions proposed to-date, did not completely overcome these challenges. This study is based on a multi-label classification (MLC) technique for modeling correlations between several HTS assays, meaning that a single prediction represents a subset of assigned correlated labels instead of one label. Thus, the devised method provides an increased probability for more accurate predictions of compounds that were not tested in particular assays. <br> RESULTS: Here we present DRABAL, a novel MLC solution that incorporates structure learning of a Bayesian network as a step to model dependency between the HTS assays. In this study, DRABAL was used to process more than 1.4 million interactions of over 400,000 compounds and analyze the existing relationships between five large HTS assays from the PubChem BioAssay Database. Compared to different MLC methods, DRABAL significantly improves the F1Score by about 22%, on average. We further illustrated usefulness and utility of DRABAL through screening FDA approved drugs and reported ones that have a high probability to interact with several targets, thus enabling drug-multi-target repositioning. Specifically DRABAL suggests the Thiabendazole drug as a common activator of the NCP1 and Rab-9A proteins, both of which are designed to identify treatment modalities for the Niemann-Pick type C disease. <br> CONCLUSION: We developed a novel MLC solution based on a Bayesian active learning framework to overcome the challenge of lacking fully labeled training data and exploit actual dependencies between the HTS assays. The solution is motivated by the need to model dependencies between existing experimental confirmatory HTS assays and improve prediction performance. We have pursued extensive experiments over several HTS assays and have shown the advantages of DRABAL. The datasets and programs can be downloaded from https://figshare.com/articles/DRABAL/3309562.Graphical abstract.Entities:
Year: 2016 PMID: 27895719 PMCID: PMC5105261 DOI: 10.1186/s13321-016-0177-8
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Summary of datasets used
| Dataset PubChem ID | Target name | Type of interacting compounds | Active class size | Inactive class size | Active to inactive ratio (imbalance ratio) |
|---|---|---|---|---|---|
| AID 1458 | Survival of motor neuron 2 | Enhancers | 5854 | 193,105 | 1:33 |
| AID 485297 | Ras-related protein Rab-9A | Activators | 9143 | 301,951 | 1:33 |
| AID 485313 | Niemann-Pick C1 protein precursor | Activators | 7586 | 304,846 | 1:40 |
| AID 588342 | Luciferase transcriptional reporter | Inhibitors | 25,159 | 304,600 | 1:12 |
| AID 686978 | Tyrosyl-DNA phosphodiesterase 1 | Inhibitors | 64,212 | 243,136 | 1:4 |
| Total interactions | 1,459,592 | ||||
Comparison of methods across five different datasets using the fivefold cross validation
| Method | GMean (%) | F1Score (%) | F0.5Score (%) |
|---|---|---|---|
| BR-SVM | 46.04 | 28.84 | 34.39 |
| BR-KNN | 24.59 | 14.91 | 23.26 |
| BR-RF | 55.56 | 45.35 | 61.26 |
| CC-MLE | 40.79 | 28.59 | 46.86 |
| DRABAL | 61.05a | 51.11a | 64.52a |
The HTS assays data is partitioned into five approximately equally sized mutually distinct subgroups such that a single subgroup representing 20% of the data is retained for testing only. For each partition (fold) of the data, the model is developed on the training portion and evaluated on the testing portion. The results from the testing folds are averaged to produce an estimation of performance. Statistically significant difference when compared with all other methods over fivefolds using t-test at the 5% significance level is denoted by a
Fig. 1Precision comparison of DRABAL and BR-RF over five HTS assays. Precision is evaluated at the sensitivity levels of BR-RF (the second best method) in order to highlight achieved gain using DRABAL
Comparison of methods across five different datasets using fivefold cross validation
| Method | GMean (%) | F1Score (%) | F0.5Score (%) |
|---|---|---|---|
| RandomOrder-10 | 51.14 | 36.84 | 46.44 |
| DRABAL | 61.05a | 51.11a | 64.52a |
The HTS assays data is partitioned into five approximately equally sized mutually distinct subgroups such that a single subgroup representing 20% of the data is retained for testing only. For each partition (fold) of the data, the model is developed on the training portion and evaluated on the testing portion. The results from the testing folds are averaged to produce an estimation of performance. Statistically significant difference when compared with all other methods over fivefolds using t-test at the 5% significance level is denoted by a
Fig. 2Venn diagram of correct predictions for four selected methods. The diagram includes average number of counts (i.e. average of fivefold cross-validation) of correct predictions using four methods and counts matching with actual ground truth
Top five predicted interactions from DrugBank approved drugs database
| Rank | AID 1458 | AID 485297 | AID 485313 | AID 588342 | AID 686978 |
|---|---|---|---|---|---|
| 1 | Amlexanox DB01025 (0.48) | Nitazoxanide DB00507 (0.67) | Thiabendazole DB00730 (0.33) | Phenazopyridine DB01438 (0.68) | Vinblastine DB00570 (0.99) |
| 2 | Mycophenolate mofetil DB00688 (0.3) | Thiabendazole DB00730 (0.61) | Omeprazole DB00338 (0.21) | Mitoxantrone DB01204 (0.53) | Plicamycin DB06810 (0.99) |
| 3 | Rabeprazole DB01129 (0.2) | Omeprazole DB00338 (0.23) | Phenazopyridine DB01438 (0.21) | Phenindione DB00498 (0.49) | Bromocriptine DB01200 (0.98) |
| 4 | Pramipexole DB00413 (0.14) | Nabumetone DB00461 (0.19) | Mebendazole DB00643 (0.12) | Olsalazine DB01250 (0.47) | Ketoconazole DB01026 (0.98) |
| 5 | Idoxuridine DB00249 (0.13) | Mycophenolic acid DB01024 (0.18) | Olsalazine DB01250 (0.12) | Amsacrine DB00276 (0.44) | Teniposide DB00444 (0.97) |
Thiabendazole (DB00730) is the top common prediction for BioAssays AID 485297 and AID 485313
Fig. 3Chemical-Protein interactions graph generated using STITCH tool. STITCH tool was queried using NPC1 and Rab-9A concepts and then produced this graph. Nodes, which show concepts not directly related to this generated graph, were removed in order to highlight most relevant concepts to the repositioned drug
Fig. 4Illustration of our proposed method DRABAL. DRABAL has two learning phases including a Bayesian learning phase and an active learning phase for building the multi-label classification models
Fig. 5Bayesian network for five used HTS assays. Size of the node indicates the number of positive interactions reported in the corresponding HTS assay