| Literature DB >> 27832156 |
Marta Enciso1, Nastaran Meftahi1, Michael L Walker1, Brian J Smith1.
Abstract
The reliability of quantitative structure-property relationship (QSPR) and quantitative structure-activity relationship (QSAR) models is often difficult to assess due to the problems of accessing the tools and data used to build the models. We present here BioPPSy, which aims to fill this gap by providing an easy-to-use open-source software platform. We demonstrate the program capabilities by calculating three key properties used in drug discovery, aqueous solubility, Caco-2 cell permeability and blood-brain barrier permeability. A comparison is made with a number of previously reported methods, taken from the literature, for each property. The software, including source code, current models and databases, is available from https://sourceforge.net/projects/bioppsy/.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27832156 PMCID: PMC5104412 DOI: 10.1371/journal.pone.0166298
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Snapshot of the BioPPSy software.
Fig 2Workflow of the BioPPSy software.
Comparison of performance of BioPPSy with literature methods for predicting the logarithm of aqueous solubility.
| Literature | BioPPSy | BioPPSy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| r2 | σ | r2 | σ | Δmax | r2 | σ | MUE | Δmax | |||
| Klopman and Hou. [ | 118 | 1168 | 0.95 | 0.50 | - | - | - | 0.73 | 1.05 | 0.80 | -5.9 |
| McElroy and Jurs [ | 11 | 298 | 0.79 | na | 0.74 | 0.94 | 3.4 | 0.70 | 1.11 | 0.85 | 3.9 |
| Tetko | 33 | 879 | 0.86 | na | 0.83 | 0.82 | -3.0 | 0.83 | 0.84 | 0.66 | -3.0 |
| Cheng and Merz [ | 8 | 755 | 0.84 | na | - | - | - | 0.84 | 0.42 | 0.61 | -4.4 |
| Delaney [ | 4 | 1144 | na | na | 0.82 | 0.89 | 4.0 | 0.82 | 0.86 | 0.64 | 4.7 |
| Hou | 76 | 878 | 0.96 | 0.61 | 0.92 | 0.57 | -2.2 | 0.90 | 0.64 | 0.50 | -2.3 |
d is the number of descriptors used in each model (excluding intercept). N is the number of molecules in the datasets used in the original model. r2 is the coefficient of determination of the fitting and σ is the standard deviation. MUE is the mean unsigned error. Δmax is the largest difference between experimental and predicted solubility. A dash (–) indicates the dataset was not available. ‘na’ indicates the coefficient of determination or standard deviation was not reported.
aRegression statistics obtained using d descriptors on datasets of size N reported in the literature.
bRegression statistics obtained using d descriptors on the dataset of 1297 organic compounds extracted from the AQUASOL and PHYSPROP datasets.
Comparison of performance of BioPPSy with literature methods for predicting the logarithm of blood-brain barrier permeability.
| Literature | BioPPSy | BioPPSy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| r2 | σ | r2 | σ | Δmax | r2 | σ | MUE | Δmax | |||
| Kansy and van de Waterbeemd [ | 2 | 20 | 0.70 | 0.45 | - | - | - | 0.45 | 0.54 | 0.42 | -1.68 |
| Hou and Xu [ | 4 | 59 | 0.76 | 0.41 | - | - | - | 0.49 | 0.53 | 0.25 | -1.52 |
| Clark [ | 2 | 55 | 0.77 | 0.46 | 0.75 | 0.82 | -1.42 | 0.52 | 0.51 | 0.38 | -1.71 |
| Feher | 3 | 61 | 0.73 | na | 0.63 | 0.39 | -1.17 | 0.54 | 0.50 | 0.38 | 1.58 |
aRegression statistics obtained using d descriptors on datasets of size N reported in the literature.
bRegression statistics obtained using d descriptors on a dataset containing 181 compounds [28].
Comparison of performance of BioPPSy with literature methods for predicting the logarithm of Caco-2 cell permeability.
| Literature | BioPPSy | BioPPSy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| r2 | σ | r2 | σ | Δmax | r2 | σ | MUE | Δmax | |||
| Ertl | 1 | 9 | 0.98 | na | 0.96 | 0.22 | -0.32 | 0.31 | 0.71 | 0.61 | -2.67 |
| Palm | 1 | 6 | 0.99 | na | 0.96 | 0.15 | 0.15 | 0.23 | 0.78 | 0.65 | -2.89 |
| Osterberg & Norinder [ | 4 | 11 | 0.92 | 0.21 | 0.99 | 0.04 | 0.07 | 0.39 | 0.70 | 0.56 | -2.39 |
| van de Waterbeemd | 2 | 17 | 0.69 | na | 0.65 | 0.61 | -1.12 | 0.24 | 0.78 | 0.65 | -2.88 |
| Gozalbes | 13 | 97 | 0.77 | 0.49 | 0.70 | 0.50 | -1.31 | 0.58 | 0.58 | 0.45 | -1.68 |
aRegression statistics obtained using d descriptors on datasets of size N reported in the literature.
bRegression statistics obtained using d descriptors on the 159 compound dataset of Gozalbes et al. [34].