| Literature DB >> 34203441 |
Shamkhal Baybekov1, Gilles Marcou1, Pascal Ramos2, Olivier Saurel2, Jean-Luc Galzi3,4, Alexandre Varnek1.
Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules ("fragments") in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.Entities:
Keywords: DMSO solubility; NMR; QSPR; fragment-based screening; outlier detection
Year: 2021 PMID: 34203441 PMCID: PMC8271413 DOI: 10.3390/molecules26133950
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Solubility domains defined by the thresholds defined for stock solutions (10 mM) and FBS (1 mM). For these two threshold definitions, the “soluble”/“insoluble” labels coincide for solubility values larger than 10 mM and smaller than 1 mM, respectively. However, in the range 1–10 mM, molecules are considered soluble according to the FBS definition, but insoluble according to the stock solution definition.
Figure 2The modeling workflow.
Figure 3The class landscape for the PICT dataset. Blue and red zones are populated by insoluble and soluble molecules, respectively. Green and yellow zones contain a mixture of soluble and insoluble compounds.
Figure 4The class landscape depicting the coverage of a fragment-like chemical space by PICT and Enamine datasets. Blue and red zones are populated, respectively, by Enamine and PICT molecules. Green and yellow zones contain a mixture of compounds from the two datasets.
Example of incorrectly predicted compounds and their correctly predicted close analogues.
| Incorrectly Predicted Compounds | Correctly Predicted Similar Compounds | ||||||
|---|---|---|---|---|---|---|---|
| # | Compound structure | Exp | Pred | # | Compound structure | Exp | Pred |
|
|
| Soluble | Insoluble |
|
| Insoluble | Insoluble |
|
|
| Soluble | Insoluble |
|
| Insoluble | Insoluble |
|
|
| Insoluble | Soluble |
|
| Soluble | Soluble |
Predictive performance of the FBS model on the filtered Enamine data, and of the “stock solution” model on the PICT data. The number of correctly predicted compounds with respect to the total number of compounds is given between the parentheses.
| FBS Model on Enamine Dataset | «Stock Solution» Model on PICT Dataset | |
|---|---|---|
| Recall (soluble) | 0.954 (6828/7156) | 1 (676/676) |
| Recall (insoluble) | 0.052 (6/115) | 0.01 (1/101) |