| Literature DB >> 33510862 |
Eitan Margulis1, Ayana Dagan-Wiener1, Robert S Ives2, Sara Jaffari3, Karsten Siems4, Masha Y Niv1.
Abstract
Drug development is a long, expensive and multistage process geared to achieving safe drugs with high efficacy. A crucial prerequisite for completing the medication regimen for oral drugs, particularly for pediatric and geriatric populations, is achieving taste that does not hinder compliance. Currently, the aversive taste of drugs is tested in late stages of clinical trials. This can result in the need to reformulate, potentially resulting in the use of more animals for additional toxicity trials, increased financial costs and a delay in release to the market. Here we present BitterIntense, a machine learning tool that classifies molecules into "very bitter" or "not very bitter", based on their chemical structure. The model, trained on chemically diverse compounds, has above 80% accuracy on several test sets. Our results suggest that about 25% of drugs are predicted to be very bitter, with even higher prevalence (~40%) in COVID19 drug candidates and in microbial natural products. Only ~10% of toxic molecules are predicted to be intensely bitter, and it is also suggested that intense bitterness does not correlate with hepatotoxicity of drugs. However, very bitter compounds may be more cardiotoxic than not very bitter compounds, possessing significantly lower QPlogHERG values. BitterIntense allows quick and easy prediction of strong bitterness of compounds of interest for food, pharma and biotechnology industries. We estimate that implementation of BitterIntense or similar tools early in drug discovery process may lead to reduction in delays, in animal use and in overall financial burden.Entities:
Keywords: Bitter; Drug discovery; Drugs; Machine learning; Taste; Toxicity
Year: 2020 PMID: 33510862 PMCID: PMC7807207 DOI: 10.1016/j.csbj.2020.12.030
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Representation of chemical families in the training set (A) – Top chemical families represented in the dataset of very bitter compounds. (B) - Top chemical families represented in the dataset of not very bitter compounds. The compound on the lower left side (Asperosaponin VI) is a representative of very bitter triterprene saponins, the compound on the lower right (Nitrosaccharin) is a representative of not very bitter benzothiazoles. Deriv. = derivatives.
BitterIntense performance on the training, test and hold-out sets. Training set evaluation was done using k-fold cross validation with k = 10. The results in the training set column represent the mean metric with its standard deviation across 10 iteration of cross validation.
| Training set | Test set | Hold-out set | |
|---|---|---|---|
| (493 compounds) | (123 compounds) | (105 compounds) | |
| Accuracy (%) | 87 ± 5 | 83 | 80 |
| Precision (%) | 80 ± 8 | 71 | 63 |
| Recall (%) | 85 ± 4 | 86 | 77 |
| F1 score | 82 ± 5 | 78 | 70 |
| Specificity (%) | 81 | 81 |
Fig. 2Top 15% important features in the model. The importance is calculated by the average gain of splits which use the feature in the prediction process in each tree in the XGBoost model. The heavy atom count, molar refractivity, number of likely metabolic reactions (metab), number of tertiary amines and amide groups, π (carbon and attached hydrogen) component of the solvent accessible surface area (PISA), hydrophobicity (AlogP), hydrophobic component of the solvent accessible surface area (FOSA) and Predicted IC50 value for blockage of hERG channels (QPlogHERG).
Fig. 3Bitterness levels of drugs and their hepatotoxicity descriptors. (A) Distribution of pVB (silver) and pNVB (black) drugs across severity classes of hepatotoxicity. (B) Distribution of pVB and pNVB drugs across DILI concern categories. The severity of hepatotoxicity increases from left to right in all figures.
Fig. 4QPlogHERG values of VB (grey) and NVB (white) compounds. Statistical significant difference was observed using Mann-Whitney test, n = 721, P-value < 0.0001.
Fig. 5Prevalence of pVB compounds (silver) and pNVB compounds (white) across 3 datasets: NPatlas, DrugBank and COVID19 drugs. Statistical significant difference in the proportions of pVB compounds was observed using Two Proportion Z-Test.