| Literature DB >> 26925168 |
César R García-Jacas1, Ernesto Contreras-Torres2, Yovani Marrero-Ponce3, Mario Pupo-Meriño4, Stephen J Barigye5, Lisset Cabrera-Leyva6.
Abstract
BACKGROUND: Recently, novel 3D alignment-free molecular descriptors (also known as QuBiLS-MIDAS) based on two-linear, three-linear and four-linear algebraic forms have been introduced. These descriptors codify chemical information for relations between two, three and four atoms by using several (dis-)similarity metrics and multi-metrics. Several studies aimed at assessing the quality of these novel descriptors have been performed. However, a deeper analysis of their performance is necessary. Therefore, in the present manuscript an assessment and statistical validation of the performance of these novel descriptors in QSAR studies is performed.Entities:
Keywords: 3D-QSAR; Multiple Linear Regression; QuBiLS-MIDAS; TOMOCOMD-CARDD
Year: 2016 PMID: 26925168 PMCID: PMC4768433 DOI: 10.1186/s13321-016-0122-x
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Metrics used to compute the “distance” between two atoms of a molecule
| Metrics | Formulaa | Rangeb | Average | Range |
|---|---|---|---|---|
| Minkowski ( |
| [0, ∞) |
| [0, ∞) |
| Chebyshev/Lagrange ( |
| |||
| Canberra ( |
| [0, |
| [0, 1] |
| Lance–Williams/Bray–Curtis ( |
| [0, 1] |
|
|
| Clark/coefficient of divergence ( |
| [0, |
|
|
| Soergel ( |
| [0, 1] |
|
|
| Bhattacharyya ( |
| [0, ∞) |
| [0, ∞) |
| Wave–Edges ( |
| [0, |
| [0, 1] |
| Angular separation/[1 − Cosine ( |
| [0, 2] |
aThe variables x and y are the values of the coordinate j of the atoms X and Y of a molecule, respectively. The h value is equal to 3 and corresponds to the 3D Cartesian coordinates (x, y, z) of an atom. The p values in Minkowski metric are 0.25, 0.5, 1 (Manhattan), 1.5, 2 (Euclidean), 2.5 and 3 (Minkowski)
b“Range” refers to “range” and not to “rank” and is defined as Range = max{x } − min{x }
Measures used to compute the ternary (A) and quaternary (B) relations (multi-metrics) among atoms of a molecule
| Measure | Formula |
|---|---|
| (A) Ternary measures ( | |
| Perimeter ( |
|
| Triangle area ( |
|
| Sides summation ( |
|
| Bond angle (angle between sides) ( |
|
| (B) Quaternary measures ( | |
| Perimeter ( |
|
| Volume ( |
|
| Sides summation ( |
|
| Dihedral angle ( |
|
(A) Chemical structure of Chloro(methoxy)methane and its labeled molecular scaffold, (B) examples of two-tuple total spatial-(dis)similarity matrices for = 1 (order) calculated from different (dis-)similarity metrics, (C) example of three-tuple total spatial-(dis)similarity matrix for = 1 (order) calculated from bond angle ternary measure
| (A) 3D molecular structure | ||||||||
|
| ||||||||
| (B) Two-tuple total spatial-(dis)similarity matrices, | ||||||||
|
|
| |||||||
| C1 | C2 | O3 | Cl4 | C1 | C2 | O3 | Cl4 | |
| C1 | 0.000 | 2.408 | 1.439 | 3.939 | 0.000 | 1.000 | 0.973 | 1.000 |
| C2 | 2.408 | 0.000 | 1.438 | 1.757 | 1.000 | 0.000 | 0.954 | 0.293 |
| O3 | 1.439 | 1.438 | 0.000 | 2.598 | 0.973 | 0.954 | 0.000 | 0.973 |
| Cl4 | 3.939 | 1.757 | 2.598 | 0.000 | 1.000 | 0.293 | 0.973 | 0.000 |
|
|
| |||||||
| C1 | C2 | O3 | Cl4 | C1 | C2 | O3 | O3 | |
| C1 | 0.000 | 1.158 | 1.003 | 1.709 | 0.000 | 1.354 | 0.558 | 1.875 |
| C2 | 1.158 | 0.000 | 1.234 | 1.359 | 1.354 | 0.000 | 0.318 | 0.237 |
| O3 | 1.003 | 1.234 | 0.000 | 2.235 | 0.558 | 0.318 | 0.000 | 0.952 |
| Cl4 | 1.709 | 1.359 | 2.235 | 0.000 | 1.875 | 0.237 | 0.952 | 0.000 |
| (C) Three-tuple total spatial-(dis)similarity matrix, | ||||||||
|
|
| |||||||
| C1 | C2 | O3 | Cl4 | C1 | C2 | O3 | O3 | |
| C1 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.578 | 0.281 |
| C2 | 0.000 | 0.000 | 0.578 | 2.470 | 0.000 | 0.000 | 0.000 | 0.000 |
| O3 | 0.000 | 1.985 | 0.000 | 2.682 | 1.985 | 0.000 | 0.000 | 0.697 |
| Cl4 | 0.000 | 0.390 | 0.163 | 0.000 | 0.390 | 0.000 | 0.553 | 0.000 |
|
|
| |||||||
| C1 | 0.000 | 0.578 | 0.000 | 0.297 | 0.000 | 0.281 | 0.297 | 0.000 |
| C2 | 0.578 | 0.000 | 0.000 | 1.892 | 2.470 | 0.000 | 1.892 | 0.000 |
| O3 | 0.000 | 0.000 | 0.000 | 0.000 | 2.682 | 0.697 | 0.000 | 0.000 |
| Cl4 | 0.163 | 0.553 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
(A) Two-tuple spatial-(dis)similarity matrix for = 1, , computed from 3D coordinates of the molecule Chloro(methoxy)methane (see Table 1A), (B) examples of two-tuple spatial-(dis)similarity matrices, , obtained with different chemical fragments
| C1 | C2 | O3 | Cl4 | |
|---|---|---|---|---|
| (A) Two-tuple | ||||
| | 0.000 | 2.408 | 1.439 | 3.939 |
| | 2.408 | 0.000 | 1.438 | 1.757 |
| | 1.439 | 1.438 | 0.000 | 2.598 |
| | 3.939 | 1.757 | 2.598 | 0.000 |
| (B) two-tuple | ||||
| | ||||
| | 0.000 | 0.000 | 0.000 | 1.969 |
| | 0.000 | 0.000 | 0.000 | 0.878 |
| | 0.000 | 0.000 | 0.000 | 1.299 |
| | 1.969 | 0.878 | 1.299 | 0.000 |
| | ||||
| | 0.000 | 1.204 | 0.719 | 1.969 |
| | 1.204 | 0.000 | 0.000 | 0.000 |
| | 0.719 | 0.000 | 0.000 | 0.000 |
| | 1.969 | 0.000 | 0.000 | 0.000 |
| | ||||
| | 0.000 | 0.000 | 0.719 | 1.969 |
| | 0.000 | 0.000 | 0.719 | 0.878 |
| | 0.719 | 0.719 | 0.000 | 2.598 |
| | 1.969 | 0.878 | 2.598 | 0.000 |
Example of probabilistic transformations on the non-stochastic two-tuple total spatial-(dis)similarity matrix for = 1, , computed from 3D coordinates of the Chloro(methoxy)methane compound (see Table 1A) by using the Euclidean metric
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|
| Non-stochastic matrix, | Simple-stochastic matrix, | |||||||
|
| 0.000 | 2.408 | 1.439 | 3.939 | 0.000 | 0.309 | 0.185 | 0.506 |
|
| 2.408 | 0.000 | 1.438 | 1.757 | 0.430 | 0.000 | 0.257 | 0.314 |
|
| 1.439 | 1.438 | 0.000 | 2.598 | 0.263 | 0.263 | 0.000 | 0.475 |
|
| 3.939 | 1.757 | 2.598 | 0.000 | 0.475 | 0.212 | 0.313 | 0.000 |
| Double-stochastic matrix, | Mutual probability matrix, | |||||||
|
| 0.000 | 0.387 | 0.246 | 0.368 | 0.000 | 0.089 | 0.053 | 0.145 |
|
| 0.387 | 0.000 | 0.368 | 0.246 | 0.089 | 0.000 | 0.053 | 0.065 |
|
| 0.246 | 0.368 | 0.000 | 0.387 | 0.053 | 0.053 | 0.000 | 0.096 |
|
| 0.368 | 0.246 | 0.387 | 0.000 | 0.145 | 0.065 | 0.096 | 0.000 |
Scheme 1General workflow for calculating the QuBiLS MIDAS molecular descriptors. (1) Computation of the molecular vectors according to selected atomic properties; (2) Computation from 3D Cartesian coordinates of each atom of a molecule the non-stochastic two-tuple, three-tuple or four-tuple total spatial-(dis)similarity matrices for = 1; (3) Consideration of atom-types or local-fragments (optional); (4) Computation of the simple-stochastic, double-stochastic and mutual probability matrices, as well as to determine the th matrices through Hadamard product until the k value selected; (5) Splitting the calculated matrices into atom-level matrices; (6) Computation of the atom-level indices (descriptors) using the molecular vectors calculated in the step (1); and (7) Application of the selected aggregation operators over vector of atom-level descriptors
Scheme 2General workflow for the calculation of a two-linear descriptor based on the linear algebraic form, Euclidean metric, non-stochastic matrix approach, atomic mass as property and Manhattan aggregation operator. (1) Computation of the non-stochastic matrix for = 1 from the 3D coordinates matrix and using the Euclidean metric; (2) Computation of the molecular vector based on the atomic mass property, ; (3) Splitting of the matrix into “n” (number of atoms) atom-level matrices, , where “a” is an atom of the molecule; (4) Computation of the atom-level descriptors and saving them into vector ; and (5) Application of the Manhattan aggregation operator over the entries of the vector , being this value the molecular descriptor
Statistical parameters and equations of the best models developed for each chemical dataset analyzed
| Size |
|
|
| a( |
| SDEPext | Modelsa |
|---|---|---|---|---|---|---|---|
| ACE dataset | |||||||
| 6 | 0.814 | 0.7756 | 0.765 | −0.169 | 0.7422 | 1.078 |
|
| ACHE dataset | |||||||
| 8 | 0.738 | 0.6574 | 0.626 | −0.213 | 0.6309 | 0.784 |
|
| BZR dataset | |||||||
| 9 | 0.754 | 0.6931 | 0.669 | −0.170 | 0.5692 | 0.631 |
|
| COX2 dataset | |||||||
| 9 | 0.670 | 0.6313 | 0.615 | −0.091 | 0.4932 | 1.038 |
|
| DHFR dataset | |||||||
| 9 | 0.732 | 0.7055 | 0.697 | −0.077 | 0.6405 | 0.826 |
|
| GPB dataset | |||||||
| 8 | 0.893 | 0.8124 | 0.774 | −0.394 | 0.8283 | 0.499 |
|
| THER dataset | |||||||
| 7 | 0.815 | 0.7530 | 0.723 | −0.260 | 0.7248 | 1.197 |
|
| THR dataset | |||||||
| 9 | 0.866 | 0.8149 | 0.789 | −0.286 | 0.7674 | 0.540 |
|
aSee Additional file 1: Table S7 for nomenclature of the QuBiLS-MIDAS descriptors
Comparison of the cross-validation statistic parameter obtained from the QuBiLS-MIDAS models with respect to the performance achieved by 15 QSAR procedures
| ACE | ACHE | BZR | COX2 | DHFR | GPB | THER | THR | |
|---|---|---|---|---|---|---|---|---|
| QuBiLS-MIDASa |
|
|
|
|
|
|
|
|
| QuBiLS-MIDASb |
|
|
|
|
|
|
|
|
| CoMFA [ | 0.68 | 0.52 | 0.32 | 0.49 | 0.65 | 0.42 | 0.52 | 0.59 |
| COMSIA basic [ | 0.65 | 0.48 | 0.41 | 0.43 | 0.63 | 0.43 | 0.54 | 0.62 |
| COMSIA extra [ | 0.66 | 0.49 | 0.45 |
| 0.65 | 0.61 | 0.51 |
|
| EVA [ | 0.70 | 0.42 | 0.40 | 0.45 | 0.64 | 0.58 | 0.48 | 0.47 |
| HQSAR [ |
| 0.34 | 0.42 | 0.50 | 0.69 |
| 0.49 | 0.50 |
| 2D [ | 0.68 | 0.32 | 0.36 | 0.49 | 0.51 | 0.31 | 0.62 | 0.62 |
| 2.5D [ |
| 0.31 | 0.35 | 0.55 | 0.53 | 0.46 | 0.66 | 0.52 |
| SAMFA-RF [ | 0.69 |
| 0.43 | 0.38 | 0.70 |
| 0.52 | 0.53 |
| SAMFA-SVM [ | 0.52 | 0.29 | 0.38 | 0.39 | 0.57 | 0.53 | 0.18 | 0.39 |
| SAMFA-PLS [ | 0.65 | 0.54 | 0.49 | 0.40 | 0.68 | 0.61 | 0.60 | 0.56 |
| Fingerprints Library [ | 0.69 | 0.57 |
| 0.55 |
| 0.53 | 0.53 | 0.58 |
| O3Q [ | 0.69 | 0.52 | 0.42 | 0.48 | 0.70 | 0.55 | 0.48 | 0.59 |
| O3QMFA [ | 0.65 | 0.41 | 0.41 | 0.43 | 0.69 | 0.30 | 0.47 | 0.65 |
| O3A/O3Q [ | 0.71 | 0.55 | 0.46 | 0.46 | 0.66 | 0.50 |
| 0.68 |
| COSMO | 0.71 | 0.53 | 0.45 | 0.54 | 0.69 | 0.61 | 0.58 | 0.74 |
a values corresponding to the best model reported considering total and local-fragment QuBiLS-MIDAS indices (see Table 6)
b values corresponding to the best model reported considering only total QuBiLS-MIDAS indices (see Additional file 1: Table S4)
Italic values correspond to the best results reported in the literature and those obtained by the QuBiLS-MIDAS 3D-MDs
Comparison of the external predictive accuracy attained by the QuBiLS-MIDAS models with respect to the generalization ability achieved with 12 QSAR procedures
| ACE | ACHE | BZR | COX2 | DHFR | GPB | THER | THR | |
|---|---|---|---|---|---|---|---|---|
| QuBiLS-MIDASa |
|
|
|
|
|
|
|
|
| QuBiLS-MIDASb |
|
|
|
|
|
|
|
|
| CoMFA [ | 0.49 | 0.47 | 0.00 | 0.29 | 0.59 | 0.42 | 0.54 | 0.63 |
| COMSIA basic [ | 0.52 | 0.44 | 0.08 | 0.03 | 0.52 | 0.46 | 0.36 | 0.55 |
| COMSIA extra [ | 0.49 | 0.44 | 0.12 | 0.37 | 0.53 | 0.59 | 0.53 | 0.63 |
| EVA [ | 0.36 | 0.28 | 0.16 | 0.17 | 0.57 | 0.49 | 0.36 | 0.11 |
| HQSAR [ | 0.30 | 0.37 | 0.17 | 0.27 | 0.63 | 0.58 | 0.53 | −0.25 |
| 2D [ | 0.47 | 0.16 | 0.14 | 0.25 | 0.47 | −0.06 | 0.14 | 0.04 |
| 2.5D [ | 0.51 | 0.16 | 0.20 | 0.27 | 0.49 | 0.04 | 0.07 | 0.28 |
| O3Q [ | 0.69 | 0.67 | 0.17 | 0.32 | 0.60 | 0.50 | 0.51 | 0.67 |
| O3QMFA [ | 0.45 | 0.61 | 0.13 | 0.37 | 0.59 | 0.29 | 0.49 | 0.60 |
| O3A/O3Q [ | 0.54 | 0.65 | 0.24 | 0.28 | 0.53 | 0.41 | −0.18 | 0.30 |
| COSMO | 0.62 | 0.61 | 0.13 |
| 0.58 | 0.63 | 0.59 | 0.66 |
| 2D-FPT [ |
|
|
| 0.329N |
|
|
|
|
a values corresponding to the best model reported considering total and local-fragment QuBiLS-MIDAS indices (see Table 6)
b values corresponding to the best model reported considering only total QuBiLS-MIDAS indices (see Additional file 1: Table S4)
L2D-FPT-based linear models
N2D-FPT-based non-linear models
Italic values correspond to the best results reported in the literature and those obtained by the QuBiLS-MIDAS 3D-MDs
Fig. 1Boxplot graphic for the external predictive accuracy achieved by each QSAR methodology considered in this manuscript
Wilcoxon signed-rank test for pairwise multiple hypothesis tests by using BH as adjustment method for controlling FDR. It shows the one-tailed p-values for the greater alternative
| 2D | 2.5D | EVA | COMSIA basic | HQSAR | O3QMFA | CoMFA | O3A/O3Q | COMSIA extra | COSMO sar3D | O3Q | 2D-FPT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.5D | 0.115 | – | – | – | – | – | – | – | – | – | – | – |
| EVA | 0.138 | 0.402 | – | – | – | – | – | – | – | – | – | – |
| COMSIA basic | 0.137 | 0.115 | 0.323 | – | – | – | – | – | – | – | – | – |
| HQSAR | 0.203 | 0.380 | 0.197 | 0.402 | – | – | – | – | – | – | – | – |
| O3QMFA | 0.046 | 0.046 | 0.138 | 0.241 | 0.312 | – | – | – | – | – | – | – |
| CoMFA | 0.051 | 0.089 | 0.115 | 0.241 | 0.367 | 0.703 | – | – | – | – | – | – |
| O3A/O3Q | 0.089 | 0.089 | 0.277 | 0.556 | 0.402 | 0.654 | 0.727 | – | – | – | – | – |
| COMSIA extra | 0.031 | 0.051 | 0.045 | 0.051 | 0.164 | 0.427 | 0.249 | 0.272 | – | – | – | – |
| COSMOsar3D | 0.027 | 0.022 | 0.036 | 0.022 | 0.051 | 0.054 | 0.027 | 0.068 | 0.015 | – | – | – |
| O3Q | 0.015 | 0.022 | 0.022 | 0.015 | 0.186 | 0.051 | 0.042 | 0.051 | 0.203 | 0.698 | – | – |
| 2D-FPT | 0.015 | 0.015 | 0.015 | 0.015 | 0.015 | 0.022 | 0.015 | 0.015 | 0.022 | 0.068 | 0.015 | – |
| QuBiLS MIDAS |
|
|
|
|
|
|
|
|
|
|
|
|
Italic values indicate statistically significant differences of the QuBiLS-MIDAS models with respect to the other QSAR methodologies
Fig. 2Boxplot graphic for the external predictive accuracy achieved by the QSAR models reported in this manuscript (see Table 6) and fitted using structures generated by CORINA software, over the corresponding test sets optimized by five different toolkits
External predictive accuracy achieved by QSAR models developed from 3D molecular structures generated with six different programs
| ACE | ACHE | BZR | COX2 | DHFR | GPB | THER | THR | Rank average | |
|---|---|---|---|---|---|---|---|---|---|
| BALLOON | 0.3296 | 0.1943 | 0.3949 | 0.2451 | 0.3758 | 0.0000 | 0.0000 | 0.0000 | 4.5 |
| CHEMAXON | 0.5504 | 0.1343 | 0.4163 | 0.3361 | 0.2978 | 0.1687 | 0.0000 | 0.1386 | 3.375 |
| CORINA | 0.4133 | 0.0556 | 0.3628 | 0.2865 | 0.4288 | 0.2767 | 0.1915 | 0.2334 | 3.25 |
| FROG2 | 0.4832 | 0.3535 | 0.3635 | 0.3393 | 0.3786 | 0.2712 | 0.3264 | 0.1457 | 2.125 |
| OPENBABEL | 0.3993 | 0.1306 | 0.1715 | 0.2775 | 0.3460 | 0.4742 | 0.2806 | 0.0803 | 4 |
| RDKIT | 0.4181 | 0.1770 | 0.3024 | 0.2189 | 0.5008 | 0.4511 | 0.0000 | 0.0710 | 3.75 |