| Literature DB >> 30591975 |
Cosimo Toma1, Domenico Gadaleta2, Alessandra Roncaglioni2, Andrey Toropov2, Alla Toropova2, Marco Marzo2, Emilio Benfenati2.
Abstract
PURPOSE: This study explored several strategies to improve the performance of literature QSAR models for plasma protein binding (PPB), such as a suitable endpoint transformation, a correct representation of chemicals, more consistency in the dataset, and a reliable definition of the applicability domain.Entities:
Keywords: ADME; QSAR; fu; logk; protein binding
Mesh:
Substances:
Year: 2018 PMID: 30591975 PMCID: PMC6308215 DOI: 10.1007/s11095-018-2561-8
Source DB: PubMed Journal: Pharm Res ISSN: 0724-8741 Impact factor: 4.200
Fig. 1Representation of the distribution of PPB data, from Obach (13), before and after transformation. The γ1 of each distribution is indicated.
Compounds in Each Datasets for Specific Ionization States
| Ionization state | No. compounds |
|---|---|
| Acid | 122 |
| Base | 137 |
| Neutral | 198 |
| Zwitterions | 55 |
| Total (used for modelling) | 489 |
Numerosity of the Splits for Each Dataset and Number of Descriptors Selected
| Transformation | No. of selected Dragon Descriptors with VSURF | No. Of compounds in TS | No. Of compounds in EVS |
|---|---|---|---|
| Total LogK | 24 | 391 | 98 |
| Total √fu | 16 | 391 | 98 |
| Acids √fu | 8 | 97 | 25 |
| Base √fu | 18 | 158 | 40 |
| Neutral √fu | 10 | 109 | 28 |
| Zwitterions √fu | 6 | 47 | 8 |
Methods Chosen for Defining the AD, Brief Description and Reference
| Method | Description |
|---|---|
| Two-class real-random classification | After permutation of descriptors on a mirror TS, the two matrices are merged and a classification model is built to distinguish real values from random ones. ( |
| Leverage | Based on calculation of the leverage (hi). New compounds that are above the hi threshold are considered outside the AD. ( |
| PCA (threshold: mean±3*SD) | After calculation of the two first PC of TS descriptors a threshold is set for each PC equal to mean ± 3*standard deviation. If values for PCs of new compounds fall outside the established range, the prediction is considered unreliable. ( |
| PCA (threshold: 0.5-0.95 percentile) | Same as the method above, but the threshold is established on the 0.5th and 0.95th percentile of distribution of TS compounds. ( |
| Nearest neighbor distance | It is based on calculation of the average Euclidean distances between all pairs of TS compounds. If the distance of a VS compound from its nearest neighbor in TS is greater than a given threshold, it is out of AD. ( |
| Atom centered fragment (ACF) | All ACFs are calculated (a central non-hydrogen atom with all atoms bonded to it) of the TS. A test compound is considered within the AD if each ACF obtained by its decomposition is part of the ACFs identified in the TS. ( |
| Fingerprint | The average similarity (Tanimoto based on PubChem fingerprints) of test compounds with the TS is determined. If average similarity is lower than 0.1 the compound is outside the AD. ( |
Smiles Attributes and their Description
| SMILES attibutes | Description |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
Performance of PPB Predicting Models
| r2/Q2 | RMSE | Coverage | AD | |
|---|---|---|---|---|
| Dragon(log K) | ||||
| TS (5-FOLD CV) | 0.61 | 0.72 | - | |
| EVS | 0.65 | 0.68 | ||
| EVS (in AD) | 0.68 | 0.65 | 0.98 | PCA – mean±3*SD |
| Dragon( | ||||
| TS (5-FOLD CV) | 0.62 | 0.19 | - | |
| EVS | 0.70 | 0.17 | ||
| EVS (in AD) | 0.72 | 0.16 | 0.87 | Two-class Real-Random Classification |
| CORAL ( | ||||
| TS+ITS+CS | 0.61 | 0.19 | - | |
| EVS | 0.69 | 0.17 | ||
| EVS (in AD) | 0.74 | 0.12 | 0.77 | CORAL AD |
Performance of PPB Predicting Models for Specific Ionization States
| r2/Q2 | RMSE | Coverage | |
|---|---|---|---|
| Acid | |||
| TS (5-FOLD CV) | 0.61 | 0.20 | - |
| EVS | 0.72 | 0.17 | |
| EVS (with two-class real-random classification AD) | 0.73 | 0.17 | 0.96 |
| Base | |||
| TS (5-FOLD CV) | 0.60 | 0.18 | - |
| EVS | 0.46 | 0.20 | |
| EVS (with two-class real-random classification AD) | 0.50 | 0.21 | 0.60 |
| Neutral | |||
| TS (5-FOLD CV) | 0.70 | 0.18 | - |
| EVS | 0.47 | 0.19 | |
| EVS (with two-class real random classification AD) | 0.75 | 0.16 | 0.50 |
| Zwitterion | |||
| TS (5-FOLD CV) | 0.64 | 0.18 | - |
| EVS | 0.46 | 0.21 | |
| EVS (with two class real-random classification AD) | 0.86 | 0.23 | 0.62 |
List of Descriptors as Selected by VSURF Included in PPB Predictive Models
| Common descriptors | Exclusive descriptors for LogK | Exclusive descriptors for √fu |
|---|---|---|
| ALOGP | nCsp2 | CATS2D_01_LL |
| P_VSA_i_2 | MLOGP2 | nCar |
| MLOGP | GATS1i | SpMin1_Bh(i) |
| P_VSA_p_3 | SpMax2_Bh(p) | Eta_betaP_A |
| C% | nBM | SM12_AEA(ri) |
| CATS2D_00_LL | MATS5e | nN+ |
| Eta_betaP | AMW | |
| PCD | F01[C-N] | |
| Ui | T(O..O) | |
| N% | J_D/Dt | |
| C-024 | SpMax_AEA(dm) | |
| CATS2D_00_PP | ||
| totalcharge |
List of Chemical Categories Showing a High Error in Prediction (Only Categories with a p <0.05 are Shown)
| Name | Description | original dataset | Likelihood Ratio |
|---|---|---|---|
| Nq | quaternary N | Acid | 7.50 |
| N+ | positively charged N | Acid | 7.50 |
| RCOOR | esters (aliphatic) | Base | 1.85 |
| OHt | tertiary alcohols | Base | 1.98 |
| RCONH2 | primary amides (aliphatic) | CORAL | 2.01 |
| CH2RX | CH2RX | LogK | 3.58 |
| CONN | urea (-thio) derivatives |
| 1.98 |
| ArOH | aromatic hydroxyls |
| 2.12 |
| RCONHR | secondary amides (aliphatic) |
| 5.12 |
List of Chemical Categories with a Small Error in Prediction (Only Categories with a p <0.05 are Shown)
| Name | Description | original dataset | Likelihood Ratio |
|---|---|---|---|
| Cq | total quaternary C(sp3) | Acid | 7.50 |
| Crq | ring quaternary C(sp3) | Acid | 7.50 |
| Cq | total quaternary C(sp3) | LogK | 1.85 |
| Beta-Lactams | Beta-Lactams | LogK | 1.98 |
| RSR | sulfides | LogK | 2.01 |
| Imidazoles | Imidazoles |
| 3.58 |
| Crq | ring quaternary C(sp3) |
| 1.98 |
| Cq | total quaternary C(sp3) |
| 2.12 |
| OHp | primary alcohols |
| 5.12 |