| Literature DB >> 35672835 |
Kedan He1.
Abstract
Facing the continuous emergence of new psychoactive substances (NPS) and their threat to public health, more effective methods for NPS prediction and identification are critical. In this study, the pharmacological affinity fingerprints (Ph-fp) of NPS compounds were predicted by Random Forest classification models using bioactivity data from the ChEMBL database. The binary Ph-fp is the vector consisting of a compound's activity against a list of molecular targets reported to be responsible for the pharmacological effects of NPS. Their performance in similarity searching and unsupervised clustering was assessed and compared to 2D structure fingerprints Morgan and MACCS (1024-bits ECFP4 and 166-bits SMARTS-based MACCS implementation of RDKit). The performance in retrieving compounds according to their pharmacological categorizations is influenced by the predicted active assay counts in Ph-fp and the choice of similarity metric. Overall, the comparative unsupervised clustering analysis suggests the use of a classification model with Morgan fingerprints as input for the construction of Ph-fp. This combination gives satisfactory clustering performance based on external and internal clustering validation indices.Entities:
Keywords: Bioactivity data; Machine learning; New psychoactive substances; Pharmacological affinity fingerprint; Similarity search; Unsupervised clustering
Year: 2022 PMID: 35672835 PMCID: PMC9171973 DOI: 10.1186/s13321-022-00607-6
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 8.489
Fig. 1The workflow for the construction of binary pharmacological affinity fingerprint. A total of 132 assay datasets for 70 unique molecular targets were extracted from the ChEMBL 29 database [55, 56]. Each RF classification model was trained with 90% of the data, validated by 10% of the data, and repeated 10 times (tenfold Nested CV). Only models with a mean MCC greater than or equal to 0.90 were included in the Ph-fp construction for the NPS set compounds
Fig. 2The workflow for the performance evaluation of Ph-fp in similarity search and clustering. a Similarity search is evaluated using EF10 and AUC. b Clustering performance is evaluated using both external (ARI, NMI) and internal (Silhouette score) validation indices
NPS set compounds pharmacological categorization and primary molecular target
| Pharmacological category [ | Target(s) | Actives |
|---|---|---|
| Stimulants | NET, DAT, SERT | 73 |
| Cannabinoids | CB1, CB2 | 29 |
| Serotonergic psychedelics | 5-HT2A, 5-HT2C | 53 |
| Depressant—opioids | 21 | |
| Depressant—benzodiazepines | GABAA | 13 |
This NPS dataset is available as a supporting file in GitHub repository: https://github.com/nina23bom/NPS-Pharmacological-profile-fingerprint-prediction-using-ML
Clustering hyperparameters investigated
| Hyperparameters | Parameter | Values explored |
|---|---|---|
| Linkage | Ward | Minimizes the variance of the clusters being merged |
| Complete | Maximum distances between all observations of the two sets | |
| Average | Average of the distances of each observation of the two sets | |
| Single | Minimum distances between all observations of the two sets | |
| Fully connected graph (RBF) | [1–5] | |
| eigen_tol | [0.1, 0.01, 0.001,0.0001,0.00001, 0.000001] | |
| n_neighbors | [7, 9, 11, 13, 15, 17, 19] | |
| eigen_tol | [0.1, 0.01, 0.001,0.0001,0.00001, 0.000001] | |
The fcluster and dendrogram in scipy.cluster.hierarchy package are used for hierarchical clustering, the SpectralClustering in sklearn.cluster package are used for spectral clusterings
Number of assay datasets used in the RF classification model and final length of Ph-fp
| 5 (10 μM) | 6 (1 μM) | 7 (10 nM) | |
|---|---|---|---|
| Total assay sets | 132 | 126 | 116 |
| Final | |||
| MACCS (116 bits) | 113 | 110 | 102 |
| Morgan (1024 bits) | 107 | 106 | 104 |
Three different affinity cutoff values and two molecular descriptors were used in the assay data curation and classification model training, and only models with MCC ≥ 0.90 were included in the final Ph-fp construction
Fig. 3The representation of 11 molecular target classes final models selected for the construction of Ph-fp. a the total assay datasets trained. b final models included in constructing p5_maccs and p5_morgan. The actives in the assay data are defined as compounds with activity values (pKI, pIC50 and pEC50) greater than or equal to 5
Fig. 4Frequency distribution of pair-wise comparison of NPS set compounds. a Structural similarity calculated using Tanimoto coefficient and structural molecular fingerprints, and, b Pharmacological similarity calculated using Rogot-Goldberg index and pharmacological affinity fingerprints
Fig. 5Distribution of the MACCS, Morgan, and p5_maccs similarity values between p5_maccs similar and p5_maccs unsimilar compound pairs
Fig. 6Histogram of the total number of active assay count (“1” bits) of NPS set compounds
Performance comparison of MACCS and Morgan fingerprints in pharmacological class similarity search
| MACCS | Morgan | ||||||
|---|---|---|---|---|---|---|---|
| EF10 | AUC | Opt_thr | EF10 | AUC | Opt_thr | ||
| Stimulants | 4.15 | 0.67 | 0.72 | 0.26 | 7.70 | 0.95 | 0.34 |
| Cannabinoids | 7.61 | 0.95 | 0.76 | 0.30 | 8.61 | 0.97 | 0.39 |
| Serotonergic psychedelics | 7.45 | 0.91 | 0.78 | 0.34 | 8.96 | 0.99 | 0.39 |
| D-opioids | 8.23 | 0.98 | 0.77 | 0.43 | 9.26 | 0.99 | 0.44 |
| D-benzodiazepines | 7.55 | 0.94 | 0.80 | 0.52 | 10.1 | 0.99 | 0.50 |
| Average | 7.00 | 0.89 | 8.92 | 0.98 | |||
The data shown is the average of 50 similarity searches for each pharmacological class
Both the query and test sets are composed of 1:10 active to decoy ratio by random sampling
Opt_thr is the optimal threshold defined by the maximal G-Mean =
Performance comparison of Ph-fp in pharmacological category similarity search
| Fingerprint | MACCS | Morgan | Ave | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | |||||||||||
| | 0.0% | 6.6% | − 1.0% | − 9.6% | − 2.1% | − 2.0% | − 2.0% | − 14.1% | 0.94 | ||
| | 44.8% | 0.0% | 4.4% | − 1.0% | − 1.1% | 2.1% | − 2.1% | − 4.0% | − 2.0% | − 6.1% | 0.95 |
| | 28.4% | 0.0% | 6.6% | − 1.0% | 1.1% | − 9.5% | − 2.1% | − 2.0% | − 2.0% | − 4.0% | 0.94 |
| | 17.9% | − 3.2% | 0.0% | − 6.1% | − 2.1% | − 16.8% | − 5.2% | − 8.1% | − 7.1% | − 7.1% | 0.89 |
| | 22.4% | 2.1% | 2.2% | − 4.1% | − 17.0% | − 13.7% | 0.0% | − 6.1% | − 5.1% | − 21.2% | 0.89 |
| | 1.5% | − 2.1% | − 1.1% | − 7.1% | − 28.4% | − 4.1% | − 9.1% | − 8.1% | 0.77 | ||
| EF10 | |||||||||||
| | 0.8% | 12.1% | − 3.2% | − 14.6% | − 10.9% | − 6.8% | − 13.9% | − 35.8% | 7.78 | ||
| | 96.1% | − 6.18% | 6.4% | − 0.6% | − 3.8% | 5.7% | − 17.1% | − 11.5% | − 11.7% | − 27.8% | 7.73 |
| | 46.8% | 0.7% | 5.8% | 0.7% | 14.6% | − 20.9% | − 11.0% | − 12.1% | − 10.5% | − 13.9% | 7.71 |
| | − 31.6% | − 60.3% | − 57.6% | − 57.0% | − 1.9% | − 63.1% | − 64.9% | − 64.7% | − 61.8% | − 26.3% | 3.99 |
| | 26.0% | 1.8% | − 3.8% | − 8.0% | − 21.3% | − 32.1% | − 10.0% | − 20.0% | − 18.3% | − 40.9% | 6.73 |
| | − 29.9% | − 45.5% | − 39.2% | − 44.7% | − 62.2% | − 51.8% | − 49.4% | − 50.9% | 3.39 | ||
Stimu Stimulants, Canna Cannabinoids, S-psyche Serotonergic psychedelics, D-opioids Depressant opioids, D-benzo Depressant benzodiazepine
The data shown is the average of 50 similarity searches for each pharmacological class using each fingerprint
Fig. 7Performance of the hierarchical clustering using different linkage methods. ARI and NMI are calculated by requesting 5 and 17 clusters and comparing them with external K = MCS and K = Pharm class labels. The dashed lines are p6_maccs, p7_maccs (green) and p6_morgan, p7_morgan (red), respectively
Fig. 8Performance of the algorithms when varying the expected number of clusters K. The ARI, NMI, and Silhouette were calculated by comparing to K = Pharm external labels. The red line indicates the five-categories of NPS compounds. The default parameter gamma = 1 was used for spectral clustering, and the Ward linkage method was used for hierarchical clustering