| Literature DB >> 34244588 |
Raquel Rodríguez-Pérez1,2, Jürgen Bajorath3.
Abstract
Machine learning is widely applied in drug discovery research to predict molecular properties and aid in the identification of active compounds. Herein, we introduce a new approach that uses model-internal information from compound activity predictions to uncover relationships between target proteins. On the basis of a large-scale analysis generating and comparing machine learning models for more than 200 proteins, feature importance correlation analysis is shown to detect similar compound binding characteristics. Furthermore, rather unexpectedly, the analysis also reveals functional relationships between proteins that are independent of active compounds and binding characteristics. Feature importance correlation analysis does not depend on specific representations, algorithms, or metrics and is generally applicable as long as predictive models can be derived. Moreover, the approach does not require or involve explainable or interpretable machine learning, but only access to feature weights or importance values. On the basis of our findings, the approach represents a new facet of machine learning in drug discovery with potential for practical applications.Entities:
Year: 2021 PMID: 34244588 PMCID: PMC8270985 DOI: 10.1038/s41598-021-93771-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Model performance.
| Recall | BA | F1 | MCC | |
|---|---|---|---|---|
| Mean | 93% | 96% | 0.90 | 0.90 |
| Std | 8% | 4% | 0.12 | 0.11 |
| Min | 66% | 83% | 0.47 | 0.54 |
The mean, standard deviation (Std) and minimum (Min) values are reported for multiple metrics including recall, BA, F1 score, and MCC across the 218 RF models.
Figure 1Feature importance correlation. Distributions of feature importance correlation values are reported in boxplots for all protein pairs in the data set. Correlation values were calculated using the Pearson (blue) and Spearman (gray) coefficients.
Figure 2Correlation for protein pairs with common active compounds. Mean feature importance correlation values are reported for protein pairs with increasing numbers of shared compounds.
Figure 3Correlation for protein pairs with common GO annotations. Mean feature importance correlation values are reported for protein pairs with increasing GO Tc values.
Figure 4Multi-target compounds. Shown are two exemplary clinical compounds with different activity. Each of these compounds is active against strongly correlated target proteins. (a) Zicronapine, with activity against HTR2A and dopamine D1 and D2 receptors. (b) Serotonin-norepinephrine-dopamine reuptake inhibitor with activity against the dopamine, norepinephrine, and serotonin transporter proteins.
Exemplary strongly correlated pairs of proteins from different classes.
| Target 1 | Target 2 | Pearson/Spearman correlation | ||
|---|---|---|---|---|
| Name | Classification | Name | Classification | |
| Cystinyl aminopeptidase | Enzyme/protease | Estrogen receptor beta | Transcription factor/nuclear receptor | 0.77/0.37 |
| Cystinyl aminopeptidase | Enzyme/protease | Estrogen receptor alpha | Transcription factor/nuclear receptor | 0.72/0.41 |
| Corticotropin releasing factor receptor 1 | Membrane receptor/G protein coupled receptor (GPCR) | Phosphodiesterase 10A | Enzyme/hydrolase | 0.72/0.41 |
| Adenosine A1 receptor | Membrane receptor/GPCR | PI3-kinase p110-delta subunit | Enzyme/transferase | 0.65/0.49 |
| Carboxyl-esterase 2 | Enzyme | Neuronal acetylcholine receptor protein alpha-7 subunit | Ion channel/ligand-gated ion channel | 0.61/0.50 |
| Prostanoid DP receptor | Membrane receptor/GPCR | Protein-tyrosine phosphatase 1B | Enzyme/hydrolase | 0.60/0.45 |
| Adenosine A2a receptor | Enzyme/hydrolase | Phosphodiesterase 10A | Membrane receptor/GPCR | 0.60/0.44 |
| Monoamine oxidase B | Enzyme/oxidoreductase | Serotonin 2c receptor | Membrane receptor/GPCR | 0.59/0.63 |
| Carbonic anhydrase IX | Enzyme/lyase | Serotonin 6 (5-HT6) receptor | Membrane receptor/GPCR | 0.58/0.65 |
| Peroxisome proliferator-activated receptor gamma | Transcription factor/nuclear receptor | Protein-tyrosine phosphatase 1B | Enzyme/hydrolase | 0.57/0.49 |
| Beta amyloid A4 protein | Membrane receptor | Serotonin transporter | Transporter/eletrochemical transporter | 0.53/0.57 |
| Acyl coenzyme A: cholesterol acyltransferase | Enzyme | Cannabinoid CB1 receptor | Membrane receptor/GPCR | 0.36/0.69 |