| Literature DB >> 25191698 |
Montiago X LaBute1, Xiaohua Zhang2, Jason Lenderman1, Brian J Bennion2, Sergio E Wong2, Felice C Lightstone2.
Abstract
Late-stage or post-market identification of adverse drug reactions (ADRs) is a significant public health issue and a source of major economic liability for drug development. Thus, reliable in silico screening of drug candidates for possible ADRs would be advantageous. In this work, we introduce a computational approach that predicts ADRs by combining the results of molecular docking and leverages known ADR information from DrugBank and SIDER. We employed a recently parallelized version of AutoDock Vina (VinaLC) to dock 906 small molecule drugs to a virtual panel of 409 DrugBank protein targets. L1-regularized logistic regression models were trained on the resulting docking scores of a 560 compound subset from the initial 906 compounds to predict 85 side effects, grouped into 10 ADR phenotype groups. Only 21% (87 out of 409) of the drug-protein binding features involve known targets of the drug subset, providing a significant probe of off-target effects. As a control, associations of this drug subset with the 555 annotated targets of these compounds, as reported in DrugBank, were used as features to train a separate group of models. The Vina off-target models and the DrugBank on-target models yielded comparable median area-under-the-receiver-operating-characteristic-curves (AUCs) during 10-fold cross-validation (0.60-0.69 and 0.61-0.74, respectively). Evidence was found in the PubMed literature to support several putative ADR-protein associations identified by our analysis. Among them, several associations between neoplasm-related ADRs and known tumor suppressor and tumor invasiveness marker proteins were found. A dual role for interstitial collagenase in both neoplasms and aneurysm formation was also identified. These associations all involve off-target proteins and could not have been found using available drug/on-target interaction data. This study illustrates a path forward to comprehensive ADR virtual screening that can potentially scale with increasing number of CPUs to tens of thousands of protein targets and millions of potential drug candidates.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25191698 PMCID: PMC4156361 DOI: 10.1371/journal.pone.0106298
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Data integration/analysis workflow scheme.
The UniProt IDs of 4,020 proteins identified in DrugBank as drug targets were extracted. We obtained 409 experimental protein structures from the Protein Data Bank (PDB) to be used as a virtual panel and docked to 906 FDA-approved small molecule compounds using the VinaLC docking code, run on a high-performance computing machine at LLNL. 560 compounds had side effect information in the SIDER database and were used in subsequent statistical analysis to build logistic regression models for ADR prediction.
Top-ranked ADR-protein associations derived from models built using the 560×409 docking score matrix.
| UniProt Name | UniProt ID | PDB ID # | p-value | q-value | beta | UniProt protein-MedDRA side effect PubMed hits |
| Interstitial collagenase | P03956 | 1hfc | 0.004 | 0.531 | 2.348 | breast neoplasm(158), adenocarcinoma(161), glioma(34), basal cell carcinoma(22) |
| Tyrosine-protein kinase SYK | P43405 | 1xbb | 0.012 | 0.531 | 1.213 | breast neoplasm(46), adenocarcinoma(11) |
| Peroxisome proliferator-activated receptor alpha | Q07869 | 2znn | 0.016 | 0.531 | 0.602 | breast neoplasm(95), adenocarcinoma(146), glioma(25), basal cell carcinoma(14) |
| Complement C3 | P01024 | 2wy8 | 0.034 | 0.531 | 0.698 | breast neoplasm(65), adenocarcinoma(136), glioma(21), lung neoplasms malignant(12), basal cell carcinoma(7) |
| Cytotoxic T-lymphocyte protein 4 | P16410 | 3osk | 0.003 | 0.555 | 0.211 | sarcoidosis(11), vasculitis(24) |
| Profilin-1 | P07737 | 1fil | 0.000 | 0.005 | 0.338 | endocrine disorder(10) |
| Coagulation factor IX | P00740 | 1edm | 0.000 | 0.005 | 0.019 | endocrine disorder(108), diabetes mellitus(48), thyroid disorder(22), hyperthyroidism(11), hypothyroidism(10) |
| Interleukin-5 | P05113 | 1hul | 0.000 | 0.005 | 0.092 | endocrine disorder(35), diabetes mellitus(19), thyroid disorder(10) |
| Caspase-3 | P42574 | 2dko | 0.002 | 0.188 | −1.876 | bipolar disorder(14), schizophrenia(31) |
| Integrin beta-2 | P05107 | 2p26 | 0.020 | 1.000 | −0.886 | cardiac arrest(11), cardiomyopathy(44), myocardial infarction(46) |
| Interstitial collagenase | P03956 | 1hfc | 0.000 | 0.060 | 0.429 | aneurysm(39), aortic aneurysm(31), arteriosclerosis(123) |
| Gelsolin | P06396 | 2fh1 | 0.000 | 0.009 | −0.073 | nephropathy(38), renal failure(12) |
The docked protein responsible for the association with the ADR is identified in the first, second, and third columns, using the UniProt name and ID and the corresponding PDB ID, respectively. Columns 4,5, and 6 give data on the statistical significance of the association with the p-value of the association, the associated false discovery rate (q-value), and the corresponding beta coefficient in the median AUC logistic regression model. Column 7 is the PubMed results that confirm the drug-protein or drug-side effect. The number of hits is shown in parentheses. Bold UniProt IDs are off-target proteins (i.e. not intended targets of the 732 drugs we consider).
ADR-protein association derived from models built using the 560×16 GBSA-corrected virtual screening panel.
| UniProt Name | UniProt ID | Corrected p-value | ADR Group | UniProt protein - MedDRA side effect PubMed hits |
| Amine oxidase [flavin-containing] A | P21397 | 0.005 | bloodAndLymph | agranulocytosis(5) |
| Histamine H1 receptor | P35367 | 0.007 | bloodAndLymph | agranulocytosis(10) |
| Beta-2 adrenergic receptor | P07550 | 0.007 | endocrineDisorders | endocrine disorder(164), diabetes mellitus(98), thyroid disorder(31), hyperthyroidism(19), hypothyroidism(16) |
| 5-hydroxytryptamine receptor 1B | P28222 | 0.007 | endocrineDisorders | endocrine disorder(15), diabetes mellitus(11) |
| Androgen receptor | P10275 | 0.018 | psychDisorders | schizophrenia(18) |
| Prostaglandin G/H synthase 2 | P35354 | 0.024 | cardiacDisorders | cardiac arrest(11), cardiomegaly(22), cardiomyopathy(91), myocardial infarction(217), myocarditis(11) |
Figure 2ADR prediction models using ‘Vina Off Targets’ and ‘DrugBank On-Targets’.
Boxplots of median AUC results for one vs. all L1-regularized logistic regression models trained using 10-fold cross-validation repeated ten times are shown. The individual models were trained on ten different adverse drug reaction (ADR) groups: Vascular disorders ("Vascular disorders"), Neoplasms, benign, malignant, and unspecified ("Neoplasms"), Immune system disorders ("Immune system disorders"), Blood and lymphatic systems disorders ("Blood and lymphatic disorders"), Psychiatric disorders ("Psychiatric disorders"), Endocrine disorders ("Endocrine disorders"), Renal disorders ("Renal & urinary disorders"), Hepatobiliary disorders ("Liver disorders"), Gastrointestinal disorders ("Gastrointestinal disorders"), and Cardiac disorders ("Cardiac disorders"). Red boxes indicate models trained on 560×409 VinaLC docking scores used as drug-protein binding features. Blue boxes indicate models trained on a 560×555 matrix containing DrugBank drug-target protein associations. VinaLC off-target models had higher AUCs than DrugBank on-target models for the “Vascular disorders” and “Neoplasms” ADR groups.
Figure 3ADR prediction using a 16-protein virtual toxicity screening panel suggested by Bowes et al. [6].
Red boxes indicate models trained on GBSA-corrected VinaLC docking scores while the blue boxes indicate models trained on DrugBank drug-target protein associations. The boxplots comprise the distribution of median AUC scores after one vs. all L1-regularized logistic regression model training using 10-fold cross-validation repeated ten times. The individual models were trained on ten different adverse drug reaction (ADR) groups: Neoplasms, benign, malignant, and unspecified ("Neoplasms"), Immune system disorders ("Immune system disorders"), Cardiac disorders ("Cardiac disorders"), Gastrointestinal disorders ("Gastrointestinal disorders"), Blood and lymphatic systems disorders ("Blood and lymphatic disorders"), Hepatobiliary disorders ("Liver disorders"), Vascular disorders ("Vascular disorders"), Endocrine disorders ("Endocrine disorders"), Psychiatric disorders ("Psychiatric disorders"), and Renal disorders ("Renal & urinary disorders").