| Literature DB >> 32178379 |
Andrey A Toropov1, Alla P Toropova1.
Abstract
Ability of quantitative structure-property/activity relationships (QSPRs/QSARs) to serve for epistemological processes in natural sciences is discussed. Some weirdness of QSPR/QSAR state-of-art is listed. There are some contradictions in the research results in this area. Sometimes, these should be classified as paradoxes or weirdness. These points are often ignored. Here, these are listed and briefly commented. In addition, hypotheses on the future evolution of the QSPR/QSAR theory and practice are suggested. In particular, the possibility of extending of the QSPR/QSAR problematic by searching for the "statistical similarity" of different endpoints is suggested and illustrated by an example for relatively "distanced each from other" endpoints, namely (i) mutagenicity, (ii) anticancer activity, and (iii) blood-brain barrier.Entities:
Keywords: Monte Carlo method; QSAR evolution; fuzzy sets; multi-target QSAR
Mesh:
Substances:
Year: 2020 PMID: 32178379 PMCID: PMC7143984 DOI: 10.3390/molecules25061292
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Distribution of 87 anticancer inhibitors [60] into training and validation sets.
|
| |
|
| |
|
|
The predictive potential of different approaches observed for different splits.
| Method | Split | Number of Compounds in Validation Set | Determination Coefficient for Validation Set |
|---|---|---|---|
|
| #1 | 18 | 0.77 |
|
| #1 | 18 | 0.43 |
|
| #1 | 18 | 0.53 |
|
| #2 | 22 | 0.84 |
|
| #3 | 21 | 0.81 |
|
| #2 | 22 | 0.82 |
|
| #3 | 21 | 0.85 |
Statistical criteria of the predictive potential for the quantitative structure–property activity relationships (QSPR/QSAR) models.
| Criterion of the Predictive Potential | Reference |
|---|---|
|
| [ |
|
| [ |
|
| [ |
|
| [ |
|
| [ |
| [ |
Figure 1Comparison of frequencies of using quantitative structure–activity relationships (QSAR) and multi-target QSAR (mt-QSAR) in drug discovery researches.
Simplified molecular input-line entry system (SMILES) attributes applied to build up a model.
| SMILES Attribute | Comments |
|---|---|
|
| One symbol or two symbols which cannot be examined separately in SMILES, e.g., Cl, Br, etc. |
|
| A combination of two connected |
| BOND | Descriptor reflects the presence in SMILES of the following symbols: ‘@’, ‘=’, and ‘#’ (i.e. presence of different bonds) |
| NOSP | Descriptor reflects the presence of the following chemical elements nitrogen (i.e., symbol ‘N’), oxygen (i.e., symbols ‘O’), Sulfur (i.e., symbol ‘S’), and phosphorus (i.e., symbol ‘P’) |
| HALO | Descriptor reflects the presence of fluorine (i.e., symbol ‘F’), chlorine (i.e., symbols ‘Cl’), bromine (i.e., symbols ‘Br’), and iodine (i.e., ‘I’) |
| PAIR | Descriptor reflects simultaneous the presence of pair of the above elements (i.e. details related to BOND, NOSP, and HALO, without any details about their places in molecular structure) |
Generalized representation of above SMILES attributes for Clc1cc(Cl)ccc1C(O)=O.
| ID | Attribute | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| C | l | . | . | . | . | . | . | . | . | . | . |
| c | . | . | . | . | . | . | . | . | . | . | . | ||
| 1 | . | . | . | . | . | . | . | . | . | . | . | ||
| c | . | . | . | . | . | . | . | . | . | . | . | ||
| c | . | . | . | . | . | . | . | . | . | . | . | ||
| ( | . | . | . | . | . | . | . | . | . | . | . | ||
| C | l | . | . | . | . | . | . | . | . | . | . | ||
| (* | . | . | . | . | . | . | . | . | . | . | . | ||
| c | . | . | . | . | . | . | . | . | . | . | . | ||
| c | . | . | . | . | . | . | . | . | . | . | . | ||
| c | . | . | . | . | . | . | . | . | . | . | . | ||
| 1 | . | . | . | . | . | . | . | . | . | . | . | ||
| C | . | . | . | . | . | . | . | . | . | . | . | ||
| ( | . | . | . | . | . | . | . | . | . | . | . | ||
| O | . | . | . | . | . | . | . | . | . | . | . | ||
| ( | . | . | . | . | . | . | . | . | . | . | . | ||
| = | . | . | . | . | . | . | . | . | . | . | . | ||
| O | . | . | . | . | . | . | . | . | . | . | . | ||
|
|
| c | . | . | . | C | l | . | . | . | . | . | . |
| c | . | . | . | 1 | . | . | . | . | . | . | . | ||
| c | . | . | . | 1 | . | . | . | . | . | . | . | ||
| c | . | . | . | c | . | . | . | . | . | . | . | ||
| c | . | . | . | ( | . | . | . | . | . | . | . | ||
| C | l | . | . | ( | . | . | . | . | . | . | . | ||
| C | l | . | . | ( | . | . | . | . | . | . | . | ||
| c | . | . | . | ( | . | . | . | . | . | . | . | ||
| c | . | . | . | c | . | . | . | . | . | . | . | ||
| c | . | . | . | c | . | . | . | . | . | . | . | ||
| c | . | . | . | 1 | . | . | . | . | . | . | . | ||
| c | . | . | . | 1 | . | . | . | . | . | . | . | ||
| C | . | . | . | 1 | . | . | . | . | . | . | . | ||
| O | . | . | . | ( | . | . | . | . | . | . | . | ||
| O | . | . | . | ( | . | . | . | . | . | . | . | ||
| = | . | . | . | ( | . | . | . | . | . | . | . | ||
| = | . | . | . | = | . | . | . | . | . | . | . | ||
|
|
| B | O | N | D | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
| N | O | S | P | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
| H | A | L | O | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|
|
| + | + | + | + | C | l | . | . | O | = | = | = |
| + | + | + | + | C | l | . | . | B | 2 | = | = | ||
| + | + | + | + | O | . | . | . | B | 2 | = | = |
*) Used only “(“, not ‘)’; **) Symbols in SS are placed according to ASCII code, in order to avoid situation wrong interpretations AB and BA as non-equivalent features.
Definition of similarities to models for mutagenicity, anticancer activity and blood–brain barrier (BBB). Here, model-1, denoted m1; model-2, denoted m2. The “m1.1” means first run of optimization for endpoint 1. Each plus denotes a promoter of an increase for endpoints (#1 or #2). Each minus denotes a promoter for a decrease for endpoints (#1 or #2).
| Attributes, SAk | m1.1 | m1.2 | m1.3 | m2.1 | m2.2 | m2.3 | |
|---|---|---|---|---|---|---|---|
| Mutagenicity (#1) vs. Anticancer Activity (#2) | |||||||
| 1 | 1........... | + | + | + | + | + | + |
| 2 | c...2....... | + | + | + | + | + | + |
| 3 | c...(....... | + | + | + | + | + | + |
| 4 | 3........... | + | + | + | + | + | + |
| 5 | C........... | + | + | + | + | + | + |
| 6 | 1...(....... | + | + | + | + | + | + |
| 7 | C...1....... | + | + | + | + | + | + |
| 8 | C...3....... | + | + | + | + | + | + |
| 9 | Cl..(....... | + | + | + | + | + | + |
| 10 | Cl.......... | + | + | + | + | + | + |
| 1 | c........... | + | + | + | − | − | − |
| 2 | O........... | + | + | + | − | − | − |
| 3 | O...(....... | + | + | + | − | − | − |
| 4 | N...(....... | − | − | − | + | + | + |
| 5 | ++++N---O=== | − | − | − | + | + | + |
| 6 | NOSP11000000 | − | − | − | + | + | + |
| 7 | C...(....... | − | − | − | + | + | + |
| 8 | C...C....... | − | − | − | + | + | + |
| Mutagenicity (#1) vs. BBB (#2) | |||||||
| 1 | 1........... | + | + | + | + | + | + |
| 2 | BOND00000000 | + | + | + | + | + | + |
| 3 | HALO00000000 | + | + | + | + | + | + |
| 4 | NOSP10000000 | + | + | + | + | + | + |
| 5 | 1...(....... | + | + | + | + | + | + |
| 6 | ++++CL--N=== | + | + | + | + | + | + |
| 7 | -........... | + | + | + | + | + | + |
| 8 | =...(....... | + | + | + | + | + | + |
| 9 | C...1....... | + | + | + | + | + | + |
| 10 | BOND10000000 | + | + | + | + | + | + |
| 11 | Cl..(....... | + | + | + | + | + | + |
| 12 | Cl.......... | + | + | + | + | + | + |
| 13 | N...+....... | + | + | + | + | + | + |
| 14 | N........... | − | − | − | − | − | − |
| 1 | O........... | + | + | + | − | − | − |
| 2 | O...(....... | + | + | + | − | − | − |
| 3 | N...1....... | + | + | + | − | − | − |
| 4 | [...+....... | + | + | + | − | − | − |
| 5 | NOSP11000000 | − | − | − | + | + | + |
| 6 | C...(....... | − | − | − | + | + | + |
| 7 | C...C....... | − | − | − | + | + | + |
| BBB (#1) vs. anticancer activity (#2) | |||||||
| 1 | C...C....... | + | + | + | + | + | + |
| 2 | C...(....... | + | + | + | + | + | + |
| 3 | 1........... | + | + | + | + | + | + |
| 4 | C...1....... | + | + | + | + | + | + |
| 5 | C...=....... | + | + | + | + | + | + |
| 6 | ++++N---B2== | + | + | + | + | + | + |
| 7 | C...2....... | + | + | + | + | + | + |
| 8 | NOSP11000000 | + | + | + | + | + | + |
| 9 | 1...(....... | + | + | + | + | + | + |
| 10 | O...C....... | + | + | + | + | + | + |
| 11 | 2...(....... | + | + | + | + | + | + |
| 12 | 4........... | + | + | + | + | + | + |
| 13 | Cl.......... | + | + | + | + | + | + |
| 14 | Cl..(....... | + | + | + | + | + | + |
| 15 | ++++S---B2== | + | + | + | + | + | + |
| 16 | HALO01000000 | + | + | + | + | + | + |
| 17 | ++++F---B2== | + | + | + | + | + | + |
| 18 | ++++F---N=== | + | + | + | + | + | + |
| 19 | HALO10000000 | + | + | + | + | + | + |
| 20 | N...4....... | + | + | + | + | + | + |
| 21 | ++++CL--S=== | + | + | + | + | + | + |
| 22 | (........... | − | − | − | − | − | − |
| 23 | O........... | − | − | − | − | − | − |
| 24 | O...(....... | − | − | − | − | − | − |
| 25 | 5........... | − | − | − | − | − | − |
| 26 | C...5....... | − | − | − | − | − | − |
| 1 | ++++Cl--B2== | + | + | + | − | − | − |
| 2 | F...(....... | + | + | + | − | − | − |
| 3 | ++++F---Cl== | + | + | + | − | − | − |
| 4 | ++++O---B2== | − | − | − | + | + | + |
| 5 | 2........... | − | − | − | + | + | + |
| 6 | =...2....... | − | − | − | + | + | + |
| 7 | 3...(....... | − | − | − | + | + | + |
| 8 | ++++O---S=== | − | − | − | + | + | + |
The matrix of similarity for examining endpoints.
| Similarity | |||
|---|---|---|---|
| Mutagenicity | Anticancer Activity | Blood–Brain Barrier | |
|
| 41 | 10 | 14 |
|
| 10 | 61 | 26 |
|
| 14 | 26 | 92 |
|
| |||
|
| 11 | 8 | 7 |
|
| 8 | 24 | 8 |
|
| 7 | 8 | 52 |
Promoters for increase carcinogenicity in male rats (MR) and female rats (FR).
| Promoters of Carcinogenicity Increase | Male Rats, MR | Total MR | Female Rats, FR | Total FR | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Split 1 | Split 2 | Split 3 | Split 1 | Split 2 | Split 3 | |||||||||||||||
| 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | |||
| Molecular features extracted from SMILES | ||||||||||||||||||||
| 1...(....... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 7 |
| 2...(....... | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2...1....... | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| C...1....... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| C...2....... | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| N...=....... | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| N...1....... | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| HALO00000000 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| BOND00000000 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 |
| BOND10000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 5 |
| BOND10100000 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Molecular features (invariants) extracted from molecular graph* | ||||||||||||||||||||
| C5......0... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 4 |
| C6......0... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 8 |
| NNC-C...101. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 2 |
| NNC-C...110. | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 5 |
| NNC-C...211. | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 |
| NNC-C...303. | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NNC-C...321 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 7 |
| NNC-O...101 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| Summation | 63 | 50 | ||||||||||||||||||
*) Detailed description for C5……… and C6……… represented in [74]; detailed description for NNC-Y…xxx represented in [80].