| Literature DB >> 31235739 |
Ignacio Ponzoni1,2, Víctor Sebastián-Pérez3,4, María J Martínez3,5, Carlos Roca4, Carlos De la Cruz Pérez4, Fiorella Cravero6, Gustavo E Vazquez7, Juan A Páez8, Mónica F Díaz6,9, Nuria E Campillo10.
Abstract
Alzheimer's disease is one of the most common neurodegenerative disorders in elder population. The β-site amyloid cleavage enzyme 1 (BACE1) is the major constituent of amyloid plaques and plays a central role in this brain pathogenesis, thus it constitutes an auspicious pharmacological target for its treatment. In this paper, a QSAR model for identification of potential inhibitors of BACE1 protein is designed by using classification methods. For building this model, a database with 215 molecules collected from different sources has been assembled. This dataset contains diverse compounds with different scaffolds and physical-chemical properties, covering a wide chemical space in the drug-like range. The most distinctive aspect of the applied QSAR strategy is the combination of hybridization with backward elimination of models, which contributes to improve the quality of the final QSAR model. Another relevant step is the visual analysis of the molecular descriptors that allows guaranteeing the absence of information redundancy in the model. The QSAR model performances have been assessed by traditional metrics, and the final proposed model has low cardinality, and reaches a high percentage of chemical compounds correctly classified.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31235739 PMCID: PMC6591229 DOI: 10.1038/s41598-019-45522-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Graphical representation of physicochemical and drug-like properties of the BACE1 dataset. (A) Dispersion of compounds regarding logP prediction (x-axis) and logBB prediction (y-axis). Colors are defined by % human oral absorption. (B) Dispersion of the dataset according to molecular weight (x-axis) and a parameter related to physical-chemical properties of commercially available drugs (y-axis). The color is defined by the number of violations of the rule of 5.
Number of molecular descriptors of each family computed for the database compounds.
| Type of Molecular Descriptors | # MD | Type of Molecular Descriptors | # MD |
|---|---|---|---|
| constitutional descriptors | 48 | geometrical descriptors | 74 |
| topological descriptors | 119 | RDF descriptors | 150 |
| walk and path counts | 47 | 3D-MoRSE descriptors | 160 |
| connectivity indices | 33 | WHIM descriptors | 99 |
| information indices | 47 | GETAWAY descriptors | 197 |
| 2D autocorrelations | 96 | functional group counts | 154 |
| edge adjacency indices | 107 | atom-centred fragments | 120 |
| burden eigenvalues | 64 | charge descriptors | 14 |
| topological charge indices | 21 | molecular properties | 29 |
| eigenvalue-based indices | 44 | 2D binary fingerprints | 780 |
| Randic molecular profiles | 41 | 2D frequency fingerprints | 780 |
Figure 2Graphical scheme of experiments reported for the prediction of inhibitors of protein BACE1 by applying QSAR modelling.
Molecular descriptors of DRAGON associated with the selected subsets.
| FS Method | Subset | Cardinality | MDs | Type |
|---|---|---|---|---|
| DELPHOS | A | 4 | MW | Constitutional indices |
| Mor31p | 3D-MoRSE descriptors | |||
| nCrs | Functional group counts | |||
| N-069 | Atom-centered fragments | |||
| DELPHOS | B | 4 | MW | Constitutional indices |
| piPC04 | Walk and path counts | |||
| EEig14d | Eigenvalues | |||
| Mor25p | 3D-MoRSE descriptors | |||
| WEKA | C | 10 | nTB | Constitutional indices |
| nR03 | Ring descriptors | |||
| IC3 | Information indices | |||
| G(S.F) | 3D Atom Pairs | |||
| nN = C-N< | Functional group counts | |||
| nRNH2 | Functional group counts | |||
| C-041 | Atom-centered fragments | |||
| B05[C-Cl] | 2D Atom Pairs | |||
| F03[C-O] | 2D Atom Pairs | |||
| F04[C-C] | 2D Atom Pairs | |||
| Literature | D | 4 | H1e | GETAWAY descriptors |
| RDF080m | RDF descriptors | |||
| H6m | GETAWAY descriptors | |||
| GGI7 | 2D autocorrelations |
Performances of the best QSAR classifiers obtained per each subset during external validation. The best model is highlighted in bold.
| Subset | Method | %CC | ROC | Confusion Matrix | ||
|---|---|---|---|---|---|---|
| A | RC | 67 | 0.71 |
|
| |
| 21 | 10 |
| ||||
| 7 | 14 |
| ||||
| B | RC | 69 | 0.69 |
|
| |
| 25 | 6 |
| ||||
| 10 | 11 |
| ||||
| C | RF | 75 | 0.83 |
|
| |
| 26 | 5 |
| ||||
| 8 | 13 |
| ||||
|
|
|
|
|
|
| |
|
|
|
| ||||
|
|
|
| ||||
Hybridized subset obtained from the union of different subsets.
| Hybridized Subset | Combined Subsets | Cardinality |
|---|---|---|
| HS1 | Subset D ∪ Subset A | 8 |
| HS2 | Subset D ∪ Subset B | 8 |
| HS3 | Subset D ∪ Subset C | 14 |
| HS4 | Union of all subsets | 21 |
Performances of the best QSAR classifiers obtained per each hybridized subset during external validation. The best model is highlighted in bold.
| Subset | Cardinality | Method | %CC | ROC | Confusion Matrix | ||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
|
|
|
| |||||
|
|
|
| |||||
| HS2 | 8 | RF | 77 | 0.79 |
|
| |
| 27 | 4 |
| |||||
| 8 | 13 |
| |||||
| HS3 | 14 | RC | 83 | 0.83 |
|
| |
| 28 | 3 |
| |||||
| 6 | 15 |
| |||||
| HS4 | 21 | RC | 85 | 0.84 |
|
| |
| 30 | 1 |
| |||||
| 7 | 14 |
| |||||
Performances during external validation of the best QSAR classifiers inferred for HS1 reduced subsets in each step. The final model has 6 molecular descriptors, an 85% of cases correctly classified and a ROC curve of 0.88.
| Subset | Step | Cardinality | Method | %CC | ROC | Confusion Matrix | ||
|---|---|---|---|---|---|---|---|---|
| HS1 - MW | 1 | 7 | RF | 85 | 0.85 |
|
| |
| 28 | 3 |
| ||||||
| 5 | 16 |
| ||||||
|
|
|
|
|
|
|
|
| |
|
|
|
| ||||||
|
|
|
| ||||||
| HS1 - MW - –N-069 | 3 | 5 | RF | 83 | 0.89 |
|
| |
| 29 | 2 |
| ||||||
| 7 | 14 |
| ||||||
Figure 3Performance during external validation of the best QSAR model achieved in each experimental step.
Figure 4Kendall correlation among descriptors of the best model.
Figure 5Results of the feature selection randomization experiment.
Figure 6Results of the y-randomization experiment.