| Literature DB >> 35514634 |
Dmitry S Karlov1, Sergey Sosnin1,2, Igor V Tetko3,4, Maxim V Fedorov1,2.
Abstract
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com). This journal is © The Royal Society of Chemistry.Entities:
Year: 2019 PMID: 35514634 PMCID: PMC9060647 DOI: 10.1039/c8ra10182e
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Fig. 1The schematic workflow of the pTSNE mapping procedure.
Fig. 2The learning curves obtained for different perplexity values.
Fig. 3The results of the neural network mapping for a set of GPCR ligands. (A) Contains ligands of adenosine A2 (aa2ar), adrenoreceptors β1 (adrb1) and β2 (adrb2), chemokine CXCR4 (cxcr4) and dopamine DR3 (drd3). (A), (B) and (C), contains zoomed area from upper left part of the figure (perplexity 100).
The results of application of the machine learning methods to the initial ECFP6 fingerprints and to the 2D mapped space (multiclass classification)
| Descriptor set | ML method | Accuracy | |
|---|---|---|---|
| GPCR ligands | NR ligands | ||
| ECFP6 descriptors | kNN | 0.829 | 0.526 |
| SVM | 0.821 | 0.549 | |
| XGBoost | 0.821 | 0.540 | |
| Random forest | 0.788 | 0.537 | |
| pTSNE mapping (2D space) | kNN | 0.763 | 0.383 |
| SVM | 0.704 | 0.336 | |
| XGBoost | 0.764 | 0.394 | |
| Random forest | 0.745 | 0.360 | |
| PCA mapping (2 components) | kNN | 0.739 | 0.296 |
| SVM | 0.735 | 0.345 | |
| XGBoost | 0.743 | 0.360 | |
| Random forest | 0.735 | 0.349 | |
| MDS mapping (2D space) | kNN | 0.725 | 0.326 |
| SVM | 0.543 | 0.250 | |
| XGBoost | 0.712 | 0.333 | |
| Random forest | 0.707 | 0.328 | |
Fig. 4The dependence of the resulting distance on the initial molecular similarity for the TAAR1 data set (perplexity 100). Points' colors were set according to the density level: yellow means the highest density while magenta indicate the lowest one.
Fig. 5The mapping results for TAAR1 agonists data set (perplexity 100). Points' colors were set according to the pEC50: yellow means the highest activity density while magenta indicate the lowest one.