| Literature DB >> 33105703 |
Jacob Spiegel1, Hanoch Senderowitz1.
Abstract
Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, R2 and QCV2. Similar metrics, calculated on an external set of data (e.g., QF1/F2/F32), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -" ignorant". In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by "classical" metrics, e.g., R2 and QF1/F2/F32 and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable R2 and/or QF1/F2/F32 values were unable to pick a single active compound from within the pool whereas in other cases, models with poor R2 and/or QF1/F2/F32 values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.Entities:
Keywords: QSAR equations; Quantitative Structure Activity Relationship (QSAR) models; enrichment optimizer algorithm (EOA); enrichment-based optimization; multiple linear regression (MLR); random forest (RF); support vector machine (SVM); virtual screening (VS)
Mesh:
Substances:
Year: 2020 PMID: 33105703 PMCID: PMC7672587 DOI: 10.3390/ijms21217828
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Results of the Multiple Linear Regression (MLR) models.
| Set | # Actives = | # Descriptors | MC Steps | Train | Validation | Test1 | Test2 | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| # Actives Among |
|
|
| # Actives among | # Actives among | # Actives = | # Actives among | ||||
| M2 | 50 | 7 | 106 | 0.77 | 48 (2.6) | 0.66 | 0.66 | 0.67 | 43 (2.4) | 32 (66.4) | 67 | 47 (54.5) |
| M2 | 10 | 106 | 0.80 | 48 (2.6) | 0.63 | 0.63 | 0.64 | 45 (2.5) | 0 (0) | 1 (1.2) | ||
| M2 | 13 | 106 | 0.82 | 48 (2.6) | 0.61 | 0.61 | 0.62 | 42 (2.3) | 0 (0) | 0 (0) | ||
| H1 | 50 | 7 | 106 | 0.73 | 47 (2.5) | 0.59 | 0.59 | 0.63 | 42 (2.3) | 0 (0) | 42 | 0 (0) |
| H1 | 10 | 106 | 0.78 | 48 (2.6) | 0.45 | 0.45 | 0.49 | 39 (2.1) | 0 (0) | 0 (0) | ||
| H1 | 13 | 106 | 0.82 | 49 (2.6) | 0.56 | 0.56 | 0.60 | 41 (2.2) | 0 (0) | 0 (0) | ||
| 5HT2C | 50 | 7 | 106 | 0.58 | 43 (2.4) | 0.14 | 0.14 | 0.18 | 32 (1.8) | 10 (20.8) | 58 | 21 (32.5) |
| 5HT2C | 10 | 106 | 0.65 | 44 (2.5) | 0.08 | 0.08 | 0.12 | 34 (1.9) | 1 (2.1) | 5 (7.7) | ||
| 5HT2C | 13 | 106 | 0.70 | 45 (2.5) | −0.08 | −0.08 | −0.03 | 30 (1.7) | 1 (2.1) | 6 (9.3) | ||
| hERG | 100 | 7 | 106 | 0.34 | 67 (4.7) | 0.28 | 0.28 | 0.16 | 64 (4.5) | 87 (45.6) | 26 | 21 (160.5) |
| hERG | 10 | 106 | 0.36 | 69 (4.8) | 0.32 | 0.31 | 0.23 | 68 (4.8) | 87 (45.6) | 23 (175.8) | ||
| hERG | 13 | 106 | 0.39 | 71 (5.0) | 0.33 | 0.33 | 0.24 | 72 (5.0) | 91 (47.7) | 22 (168.2) | ||
| M3 | 75 | 7 | 106 | 0.85 | 74 (2.0) | 0.66 | 0.66 | 0.68 | 68 (1.8) | 0 (0) | 4 | 0 (0) |
| M3 | 10 | 106 | 0.89 | 74 (2.0) | 0.68 | 0.68 | 0.70 | 67 (1.8) | 0 (0) | 0 (0) | ||
| M3 | 13 | 106 | 0.91 | 75 (2.0) | 0.73 | 0.73 | 0.75 | 70 (1.9) | 0 (0) | 0 (0) | ||
| D1 | 58 | 7 | 106 | 0.83 | 57 (2.0) | 0.81 | 0.81 | 0.80 | 56 (1.9) | 20 (30.9) | 20 | 2 (25.8) |
| D1 | 10 | 106 | 0.86 | 57 (2.0) | 0.77 | 0.77 | 0.75 | 57 (2.0) | 0 (0) | 0 (0) | ||
| D1 | 13 | 106 | 0.88 | 58 (2.0) | 0.74 | 0.74 | 0.72 | 56 (1.9) | 0 (0) | 0 (0) | ||
| Alpha2C | 57 | 7 | 106 | 0.77 | 53 (1.9) | 0.77 | 0.77 | 0.77 | 56 (2.0) | 33 (52.8) | 1 | 0 (0) |
| Alpha2C | 10 | 106 | 0.80 | 53 (1.9) | 0.70 | 0.70 | 0.70 | 55 (1.9) | 26 (41.6) | 0 (0) | ||
| Alpha2C | 13 | 106 | 0.83 | 54 (1.9) | 0.71 | 0.71 | 0.72 | 53 (1.9) | 29 (46.4) | 0 (0) | ||
Best (out of five repeats, based on the performances on the test sets) results obtained for the seven datasets using Enrichment Optimizer Algorithm (EOA). A compilation of all the results is provided in Tables S2–S8 (first sheet in each table) in the Supplementary Materials. Red, yellow and green coloring represent cases where EOA performances on test sets are poorer than, similar to or better than the corresponding Multiple Linear Regression (MLR) models presented in Table 1.
| Set | # Descriptors | # Actives = | MC Steps | # Actives among | Test2 | |||
|---|---|---|---|---|---|---|---|---|
| Train | Validation | Test1 | # Actives = | # Actives among | ||||
| M2 | 7 | 50 | 106 | 47 (2.5) | 40 (2.1) | 40 (83.1) | 67 | 56 (65.0) |
| M2 | 10 | 50 | 106 | 47 (2.5) | 44 (2.4) | 39 (81.0) | 54 (62.6) | |
| M2 | 13 | 50 | 106 | 47 (2.5) | 42 (2.3) | 38 (78.9) | 54 (62.6) | |
| H1 | 7 | 50 | 106 | 48 (2.6) | 37 (2.1) | 31 (64.4) | 42 | 23 (67.6) |
| H1 | 10 | 50 | 106 | 49 (2.6) | 42 (2.3) | 32 (66.4) | 24 (70.5) | |
| H1 | 13 | 50 | 106 | 48 (2.6) | 38 (2.0) | 32 (66.4) | 22 (64.6) | |
| 5HT2C | 7 | 50 | 106 | 45 (2.5) | 34 (1.9) | 0 (0) | 58 | 6 (9.3) |
| 5HT2C | 10 | 50 | 106 | 45 (2.5) | 32 (1.8) | 1 (2.1) | 8 (12.4) | |
| 5HT2C | 13 | 50 | 106 | 47 (2.6) | 33 (1.8) | 0 (0) | 0 (0) | |
| hERG | 7 | 100 | 106 | 67 (4.7) | 60 (4.2) | 86 (45.1) | 26 | 22 (168.2) |
| hERG | 10 | 100 | 106 | 75 (5.3) | 61 (4.3) | 89 (46.6) | 23 (175.8) | |
| hERG | 13 | 100 | 106 | 77 (5.4) | 59 (4.1) | 87 (45.6) | 23 (175.8) | |
| M3 | 7 | 75 | 106 | 74 (2.0) | 65 (1.7) | 49 (45.4) | 4 | 3 (964.7) |
| M3 | 10 | 75 | 106 | 74 (2.0) | 67 (1.8) | 0 (0) | 0 (0) | |
| M3 | 13 | 75 | 106 | 74 (2.0) | 70 (1.9) | 57 (52.9) | 3 (964.7) | |
| D1 | 7 | 58 | 106 | 56 (1.9) | 54 (1.9) | 29 (44.8) | 20 | 11 (141.9) |
| D1 | 10 | 58 | 106 | 57 (2.0) | 55 (1.9) | 20 (30.9) | 1 (12.9) * | |
| D1 | 13 | 58 | 106 | 57 (2.0) | 54 (1.9) | 41 (63.4) | 17 (219.3) | |
| Alpha2C | 7 | 57 | 106 | 49 (1.7) | 47 (1.6) | 25 (40.0) | 1 | 0 (0) |
| Alpha2C | 10 | 57 | 106 | 49 (1.7) | 45 (1.6) | 25 (40.0) | 0 (0) | |
| Alpha2C | 13 | 57 | 106 | 56 (2.0) | 52 (1.8) | 15 (24.0) | 0 (0) | |
* This is the only case where a different model gave better results on Test2. The statistics of the model in terms of the number of active compounds retrieved within the first L place (enrichment) are: Training: 57 (2.0); Validation: 53 (1.8); Test1: 14 (21.6); Test2: 3 (38.7).
Figure 1Principle Component Analysis (PCA) plots of training set (orange), validation set (grey) and test set 1 (blue) compounds in the space of the descriptors comprising the 7-descriptors model (left), 10-descriptors model (middle) and 13-descriptors model (right) for the M2 dataset. The first two PCs cover 49%, 42%, and 35% of the original variance for the 7-descriptors, 10-descriptors, and 13-descriptors models, respectively.
A comparison of the performances of Enrichment Optimizer Algorithm (EOA) models with those of Random Forest (RF) and Support Vector Machine (SVM) for the seven datasets considered in this work. For EOA we provide the results obtained with the best models as determined according to performances on test set 1. A complete listing of the results obtained with all models is provided in Tables S2–S8 (2nd–5th sheets) of the Supplementary Materials. Each cell contains the number of active compounds found within the first L places and in parenthesis, the enrichment calculated according to Equation (5) and the Matthews Correlation Coefficient (MCC). Green coloring represent the best models (as judged be performances on the test set) from within EOA, RF and SVM.
| Set | Run | # Actives among | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| EOA | RF | SVM | ||||||||
| Train | Validation | Test 1 | Train | Validation | Test 1 | Train | Validation | Test 1 | ||
| M2 | 1 | 46 (9.2; 0.91) | 41 (8.2; 0.80) | 50 (103.8; 1.00) | 42 (8.4; 0.88) | 39 (7.8; 0.77) | 39 (81.0; 0.88) | 43 (8.6; 0.92) | 36 (7.2; 0.84) | 36 (74.8; 0.85) |
| 2 | 46 (9.2; 0.91) | 40 (8.0; 0.78) | 48 (99.7; 0.96) | 42 (8.4; 0.83) | 37 (7.4; 0.73) | 37 (76.8; 0.86) | 42 (8.4; 0.83) | 37 (7.4; 0.84) | 37 (76.8; 0.86) | |
| 3 | 45 (9.0; 0.89) | 41 (8.2; 0.80) | 47 (97.6; 0.94) | 43 (8.6; 0.86) | 42 (8.4; 0.79) | 42 (87.2; 0.92) | 41 (8.2; 0.84) | 37 (7.4; 0.73) | 37 (76.8; 0.86) | |
| 4 | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 48 (99.7; 0.96) | 40 (8.0; 0.81) | 40 (8.0; 0.80) | 40 (83.1; 0.89) | 41 (8.2; 0.82) | 38 (7.6; 0.77) | 38 (78.9; 0.87) | |
| H1 | 1 | 44 (8.8; 0.87) | 36 (7.2; 0.69) | 38 (78.9; 0.76) | 46 (9.2; 0.92) | 29 (5.8; 0.61) | 29 (60.2; 0.73) | 41 (8.2; 0.88) | 33 (6.6; 0.76) | 33 (68.5; 0.81) |
| 2 | 42 (8.4; 0.82) | 31 (6.2; 0.58) | 47 (97.6; 0.94) | 47 (9.4; 0.91) | 33 (6.6; 0.67) | 33 (68.5; 0.78) | 40 (8.0; 0.85) | 34 (6.8; 0.78) | 34 (70.6; 0.82) | |
| 3 | 38 (7.6; 0.73) | 34 (6.8; 0.64) | 42 (87.2; 0.84) | 47 (9.4; 0.88) | 39 (7.8; 0.77) | 39 (81.0; 0.70) | 39 (7.8; 0.86) | 31 (6.2; 0.73) | 31 (64.4; 0.79) | |
| 4 | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 39 (81.0; 0.78) | 41 (8.2; 0.82) | 27 (5.4; 0.46) | 27 (56.1; 0.52) | 39 (7.8; 0.79) | 27 (5.4; 0.7) | 27 (56.1; 0.73) | |
| 5HT2C | 1 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.98) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 48 (9.6; 0.98) | 48 (99.7; 0.98) |
| 2 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | |
| 3 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.99) | 50 (10.0; 0.96) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | |
| 4 | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 50 (103.8; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 50 (103.8; 1.00) | |
| hERG | 1 | 41 (8.2; 0.80) | 29 (5.8; 0.53) | 42 (87.2; 0.84) | 39 (7.8; 0.83) | 23 (4.6; 0.41) | 23 (47.8; 0.31) | 35 (7.0; 0.81) | 19 (3.8; 0.55) | 19 (39.5; 0.61) |
| 2 | 42 (8.4; 0.82) | 22 (4.4; 0.38) | 38 (78.9; 0.76) | 42 (8.4; 0.90) | 26 (5.2; 0.56) | 26 (54.0; 0.33) | 38 (7.6; 0.84) | 19 (3.8; 0.52) | 19 (39.5; 0.61) | |
| 3 | 39 (7.8; 0.76) | 23 (4.6; 0.40) | 45 (93.4; 0.90) | 35 (7.0; 0.77) | 32 (6.4; 0.68) | 32 (66.4; 0.64) | 31 (6.2; 0.77) | 22 (4.4; 0.63) | 22 (45.7; 0.66) | |
| 4 | 38 (7.6; 0.73) | 29 (5.8; 0.53) | 47 (97.6; 0.94) | 34 (6.8; 0.78) | 19 (3.8; 0.49) | 19 (39.5; 0.55) | 33 (6.6; 0.8) | 18 (3.6; 0.56) | 18 (37.4; 0.60) | |
| D1 | 1 | 42 (8.4; 0.82) | 37 (7.4; 0.71) | 44 (91.4; 0.88) | 42 (8.4; 0.86) | 33 (6.6; 0.71) | 33 (68.5; 0.69) | 45 (9.0; 0.94) | 44 (8.8; 0.84) | 44 (91.4; 0.94) |
| 2 | 42 (8.4; 0.82) | 33 (6.6; 0.62) | 46 (95.5; 0.92) | 50 (10;.0 0.97) | 31 (6.2; 0.71) | 31 (64.4; 0.55) | 44 (8.8; 0.91) | 35 (7.0; 0.81) | 35 (72.7; 0.84) | |
| 3 | 41 (8.2; 0.80) | 41 (8.2; 0.80) | 45 (93.4; 0.90) | 48 (9.6; 0.93) | 37 (7.4; 0.68) | 37 (76.8; 0.71) | 46 (9.2; 0.94) | 36 (7.2; 0.79) | 36 (74.8; 0.85) | |
| 4 | 39 (7.8; 0.76) | 37 (7.4; 0.71) | 45 (93.4; 0.90) | 48 (9.6; 0.98) | 40 (8.0; 0.82) | 40 (83.1; 0.87) | 42 (8.4; 0.91) | 37 (7.4; 0.80) | 37 (76.8; 0.86) | |
| M3 | 1 | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 49 (101.7 0.98) | 50 (10.0; 0.98) | 50 (10.0; 0.96) | 50 (103.8; 1.00) | 45 (9.0; 0.94) | 45 (9.0; 0.92) | 45 (93.4; 0.95) |
| 2 | 48 (9.6; 0.96) | 44 (8.8; 0.87) | 48 (99.7; 0.96) | 38 (7.6; 0.84) | 35 (7.0; 0.78) | 35 (72.7; 0.84) | 40 (8.0; 0.88) | 35 (7.0; 0.82) | 35 (72.7; 0.84) | |
| 3 | 44 (8.8; 0.87) | 41 (8.2; 0.80) | 48 (99.7; 0.96) | 41 (8.2; 0.88) | 32 (6.4; 0.78) | 32 (66.4; 0.80) | 41 (8.2; 0.9) | 33 (6.6; 0.80) | 33 (68.5; 0.81) | |
| 4 | 48 (9.6; 0.96) | 44 (8.8; 0.87) | 44 (91.4; 0.88) | 44 (8.8; 0.92) | 37 (7.4; 0.84) | 37 (76.8; 0.86) | 42 (8.4; 0.91) | 37 (7.4; 0.85) | 37 (76.8; 0.86) | |
| Alpha2C | 1 | 40 (8.0; 0.78) | 34 (6.8; 0.64) | 37 (76.8; 0.74) | 42 (8.4; 0.86) | 35 (7.0; 0.72) | 35 (72.7; 0.84) | 37 (7.4; 0.85) | 33 (6.6; 0.77) | 33 (68.5; 0.81) |
| 2 | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 44 (91.4; 0.88) | 42 (8.4; 0.87) | 38 (7.6; 0.78) | 38 (78.9; 0.87) | 39 (7.8; 0.86) | 34 (6.8; 0.78) | 34 (70.6; 0.82) | |
| 3 | 43 (8.6; 0.84) | 35 (7.0; 0.67) | 43 (89.3; 0.86) | 41 (8.2; 0.85) | 38 (7.6; 0.75) | 38 (78.9; 0.79) | 42 (8.4; 0.91) | 34 (6.8; 0.76) | 34 (70.6; 0.82) | |
| 4 | 40 (8.0; 0.78) | 37 (7.4; 0.71) | 42 (87.2; 0.84) | 39 (7.8; 0.81) | 40 (8.0; 0.81) | 40 (83.1; 0.89) | 44 (8.8; 0.93) | 41 (8.2; 0.82) | 41 (85.1; 0.90) | |
The same as Table 3 but the best Enrichment Optimizer Algorithm (EOA) models were selected based on their performances on test set 2. Asterisks (*) denote cases where the same model performed best for both test set 1 and test set 2.
| Set | Run | # Actives among | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| EOA | RF | SVM | ||||||||
| Train | Validation | Test 2 | Train | Validation | Test 2 | Train | Validation | Test 2 | ||
| M2 | 1 * | 46 (9.2; 0.91) | 41 (8.2; 0.80) | 6 (857.8; 1.00) | 42 (8.4; 0.88) | 39 (7.8; 0.77) | 5 (714.9; 0.91) | 43 (8.6; 0.92) | 36 (7.2; 0.84) | 5 (714.9; 0.91) |
| 2 | 47 (9.4; 0.93) | 37 (7.4; 0.71) | 6 (857.8; 1.00) | 42 (8.4; 0.83) | 37 (7.4; 0.73) | 5 (714.9; 0.91) | 42 (8.4; 0.83) | 37 (7.4; 0.84) | 5 (714.9; 0.91) | |
| 3 | 49 (9.8; 0.98) | 42 (8.4; 0.82) | 5 (714.9; 0.83) | 43 (8.6; 0.86) | 42 (8.4; 0.79) | 1 (143; 0.41) | 41 (8.2; 0.84) | 37 (7.4; 0.73) | 2 (285.9; 0.58) | |
| 4 * | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 5 (714.9; 0.83) | 40 (8.0; 0.81) | 40 (8.0; 0.80) | 5 (714.9; 0.91) | 41 (8.2; 0.82) | 38 (7.6; 0.77) | 5 (714.9; 0.91) | |
| H1 | 1 | 46 (9.2; 0.91) | 36 (7.2; 0.69) | 115 (32.8; 0.84) | 46 (9.2; 0.92) | 29 (5.8; 0.61) | 84 (24; 0.77) | 41 (8.2; 0.88) | 33 (6.6; 0.76) | 81 (23.1; 0.77) |
| 2 * | 42 (8.4; 0.82) | 31 (6.2; 0.58) | 128 (34.5; 0.91) | 47 (9.4; 0.91) | 33 (6.6; 0.67) | 85 (22.9; 0.76) | 40 (8.0; 0.85) | 34 (6.8; 0.78) | 81 (21.8; 0.76) | |
| 3 * | 38 (7.6; 0.73) | 34 (6.8; 0.64) | 115 (34.3; 0.86) | 47 (9.4; 0.88) | 39 (7.8; 0.77) | 88 (26.2; 0.72) | 39 (7.8; 0.86) | 31 (6.2; 0.73) | 71 (21.2; 0.73) | |
| 4 * | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 120 (33.3; 0.87) | 41 (8.2; 0.82) | 27 (5.4; 0.46) | 90 (25.0; 0.70) | 39 (7.8; 0.79) | 27 (5.4; 0.70) | 97 (26.9; 0.84) | |
| 5HT2C | 1 * | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.98) | 98 (53.5; 1.00) | 50 (10.0; 1.00) | 48 (9.6; 0.98) | 97 (52.9; 0.99) |
| 2 * | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 97 (54.0; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 97 (54; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 97 (54.0; 1.00) | |
| 3 * | 50 (10.0; 1.00) | 49 (9.8; 0.98) | 98 (53.5; 1.00) | 50 (10.0; 0.99) | 50 (10.0; 0.96) | 98 (53.5; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | |
| 4 * | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | 50 (10.0; 0.98) | 50 (10.0; 0.99) | 98 (53.5; 1.00) | 50 (10.0; 1.00) | 50 (10.0; 1.00) | 98 (53.5; 1.00) | |
| hERG | 1 * | 41 (8.2; 0.80) | 29 (5.8; 0.53) | 112 (37.2; 0.89) | 39 (7.8; 0.83) | 23 (4.6; 0.41) | 65 (21.6; 0.47) | 35 (7.0; 0.81) | 19 (3.8; 0.55) | 62 (20.6; 0.70) |
| 2 * | 42 (8.4; 0.82) | 22 (4.4; 0.38) | 110 (78.9; 0.87) | 42 (8.4; 0.90) | 26 (5.2; 0.56) | 72 (23.9; 0.49) | 38 (7.6; 0.84) | 19 (3.8; 0.52) | 51 (16.9; 0.63) | |
| 3 | 37 (7.4; 0.71) | 28 (5.6; 0.51) | 103 (34.2; 0.81) | 35 (7.0; 0.77) | 32 (6.4; 0.68) | 70 (23.2; 0.66) | 31 (6.2; 0.77) | 22 (4.4; 0.63) | 61 (20.2; 0.69) | |
| 4 | 42 (8.4; 0.82) | 32 (6.4; 0.6) | 111 (36.8; 0.88) | 34 (6.8; 0.78) | 19 (3.8; 0.49) | 42 (13.9; 0.54) | 33 (6.6; 0.80) | 18 (3.6; 0.56) | 47 (15.6; 0.61) | |
| D1 | 1 * | 42 (8.4; 0.82) | 37 (7.4; 0.71) | 40 (102.4; 0.89) | 42 (8.4; 0.86) | 33 (6.6; 0.71) | 35 (89.6; 0.75) | 45 (9.0; 0.94) | 44 (8.8; 0.84) | 38 (97.3; 0.92) |
| 2 | 42 (8.4; 0.82) | 37 (7.4; 0.71) | 41 (105; 0.91) | 50 (10.0; 0.97) | 31 (6.2; 0.71) | 35 (89.6; 0.64) | 44 (8.8; 0.91) | 35 (7.0; 0.81) | 35 (89.6; 0.88) | |
| 3 | 41 (8.2; 0.80) | 37 (7.4; 0.71) | 38 (93.2; 0.82) | 48 (9.6; 0.93) | 37 (7.4; 0.68) | 39 (95.6; 0.77) | 46 (9.2; 0.94) | 36 (7.2; 0.79) | 35 (85.8; 0.87) | |
| 4 | 40 (8.0; 0.78) | 37 (7.4; 0.71) | 43 (105.4; 0.93) | 48 (9.6; 0.98) | 40 (8.0; 0.82) | 44 (107.9; 0.96) | 42 (8.4; 0.91) | 37 (7.4; 0.80) | 39 (95.6; 0.92) | |
| M3 | 1 * | 44 (8.8; 0.87) | 42 (8.4; 0.82) | 65 (77.7; 0.98) | 50 (10.0; 0.98) | 50 (10.0; 0.96) | 66 (78.9; 1.00) | 45 (9.0; 0.94) | 45 (9.0; 0.92) | 55 (65.7; 0.91) |
| 2 * | 48 (9.6; 0.96) | 44 (8.8; 0.87) | 66 (78.9; 1.00) | 38 (7.6; 0.84) | 35 (7.0; 0.78) | 47 (56.2; 0.84) | 40 (8.0; 0.88) | 35 (7.0; 0.82) | 48 (57.4; 0.85) | |
| 3 * | 44 (8.8; 0.87) | 41 (8.2; 0.8) | 64 (76.5; 0.97) | 41 (8.2; 0.88) | 32 (6.4; 0.78) | 40 (47.8; 0.78) | 41 (8.2; 0.90) | 33 (6.6; 0.80) | 43 (51.4; 0.81) | |
| 4 | 44 (8.8; 0.87) | 41 (8.2; 0.8) | 61 (72.9; 0.92) | 44 (8.8; 0.92) | 37 (7.4; 0.84) | 47 (56.2; 0.84) | 42 (8.4; 0.91) | 37 (7.4; 0.85) | 46 (55; 0.83) | |
| Alpha2C | 1 * | 40 (8.0; 0.78) | 34 (6.8; 0.64) | 6 (255.5; 0.54) | 42 (8.4; 0.86) | 35 (7.0; 0.72) | 8 (340.6; 0.85) | 37 (7.4; 0.85) | 33 (6.6; 0.77) | 7 (298.0; 0.80) |
| 2 * | 43 (8.6; 0.84) | 29 (5.8; 0.53) | 9 (383.2; 0.82) | 42 (8.4; 0.87) | 38 (7.6; 0.78) | 9 (383.2; 0.90) | 39 (7.8; 0.86) | 34 (6.8; 0.78) | 8 (340.6; 0.85) | |
| 3 * | 43 (8.6; 0.84) | 35 (7.0; 0.67) | 8 (340.6; 0.73) | 41 (8.2; 0.85) | 38 (7.6; 0.75) | 7 (298.1; 0.54) | 42 (8.4; 0.91) | 34 (6.8; 0.76) | 7 (298.0; 0.80) | |
| 4 * | 40 (8.0; 0.78) | 37 (7.4; 0.71) | 6 (482.7; 0.75) | 39 (7.8; 0.81) | 40 (8.0; 0.81) | 5 (402.3; 0.79) | 44 (8.8; 0.93) | 41 (8.2; 0.82) | 6 (482.7; 0.87) | |
Description of the seven datasets used for the derivation of Multiple Linear Regression (MLR) models. The “Maximal Enrichment” column provides the maximal possible enrichment at L, attainable for the data set for a comparison with the enrichment values provided in Table 1 and Table 2.
| Dataset | # Descriptors | Training Set | Validation Set | Test Set 1 | Test Set 2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | # Actives | # Random | Maximal Enrichment | # Actives | # Random | Maximal Enrichment | ||
| 5HT2C | 7, 10, 13 | 50 | 87 | 2.7 | 50 | 87 | 2.7 | 50 | 5141 | 103.8 | 67 | 5141 | 77.7 |
| M2 | 7, 10, 13 | 50 | 84 | 2.7 | 50 | 84 | 2.7 | 50 | 5141 | 103.8 | 42 | 5141 | 123.4 |
| H1 | 7, 10, 13 | 50 | 90 | 2.8 | 50 | 90 | 2.8 | 50 | 5141 | 103.8 | 58 | 5141 | 89.6 |
| hERG | 7, 10, 13 | 100 | 600 | 7.0 | 100 | 600 | 7.0 | 100 | 5141 | 52.4 | 26 | 5141 | 198.7 |
| M3 | 7, 10, 13 | 75 | 75 | 2.0 | 75 | 75 | 2.0 | 75 | 5141 | 69.5 | 4 | 5141 | 1286.3 |
| D1 | 7, 10, 13 | 58 | 58 | 2.0 | 58 | 58 | 2.0 | 58 | 5141 | 89.6 | 20 | 5141 | 258.1 |
| Alpha2C | 7, 10, 13 | 57 | 57 | 2.0 | 57 | 57 | 2.0 | 57 | 5141 | 91.2 | 1 | 5141 | 5142.0 |
Description of the four datasets used for the derivation of Random Forest (RF) and Support Vector Machine (SVM) models. The “Maximal Enrichment” column provides the maximal possible enrichment at L, attainable for the data set for a comparison with the enrichment values provided in Table 3 and Table 4.
| Dataset | # Descriptors | Training Set | Validation Set | Test Set 1 | Test Set 2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | # Actives | # Inactives | Maximal Enrichment | ||
| 5HT2C | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 97–98 * | 5141 | 53.4–54.0 |
| M2 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 6 | 5141 | 857.8 |
| H1 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 133–140 * | 5141 | 37.7–39.7 |
| hERG | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 126 | 5141 | 41.8 |
| M3 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 66 | 5141 | 78.9 |
| D1 | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 45–46 * | 5141 | 112.8–115.2 |
| Alpha2C | 7, 10, 13 | 50 | 450 | 10.0 | 50 | 450 | 10.0 | 50 | 5141 | 103.8 | 8–11 * | 5141 | 468.4–643.6 |
* Different sets have slightly different numbers of active compounds.