| Literature DB >> 34770917 |
Jürgen Drewe1, Ernst Küsters2, Felix Hammann3, Matthias Kreuter1, Philipp Boss4, Verena Schöning3.
Abstract
The adenosine monophosphate activated protein kinase (AMPK) is critical in the regulation of important cellular functions such as lipid, glucose, and protein metabolism; mitochondrial biogenesis and autophagy; and cellular growth. In many diseases-such as metabolic syndrome, obesity, diabetes, and also cancer-activation of AMPK is beneficial. Therefore, there is growing interest in AMPK activators that act either by direct action on the enzyme itself or by indirect activation of upstream regulators. Many natural compounds have been described that activate AMPK indirectly. These compounds are usually contained in mixtures with a variety of structurally different other compounds, which in turn can also alter the activity of AMPK via one or more pathways. For these compounds, experiments are complicated, since the required pure substances are often not yet isolated and/or therefore not sufficiently available. Therefore, our goal was to develop a screening tool that could handle the profound heterogeneity in activation pathways of the AMPK. Since machine learning algorithms can model complex (unknown) relationships and patterns, some of these methods (random forest, support vector machines, stochastic gradient boosting, logistic regression, and deep neural network) were applied and validated using a database, comprising of 904 activating and 799 neutral or inhibiting compounds identified by extensive PubMed literature search and PubChem Bioassay database. All models showed unexpectedly high classification accuracy in training, but more importantly in predicting the unseen test data. These models are therefore suitable tools for rapid in silico screening of established substances or multicomponent mixtures and can be used to identify compounds of interest for further testing.Entities:
Keywords: AMPK activator; QSAR; deep learning; logistic regression; machine learning; random forest; support vector machine
Mesh:
Substances:
Year: 2021 PMID: 34770917 PMCID: PMC8587902 DOI: 10.3390/molecules26216508
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1t-distributed stochastic neighbor embedding (tSNE) analysis: AMPK activators and controls.
Figure 2tSNE analysis of chemically classified activators and controls separated by chemical structure. (A) AMPK activators (N = 904); (B) AMPK control (N = 799).
Summary of results of classification of different machine learning methods.
| Method | Training Accuracy (%) | Test Accuracy (%) | Y-Randomization ** | Test Precision (%) | Sensitivity | Specificity | AUC * |
|---|---|---|---|---|---|---|---|
| RFC | 91.6 | 92.6 | 52.7 ± 2.3 | 90.3 | 91.2 | 94.0 | 0.968 ± 0.013 |
| SVM-C | 91.0 | 93.0 | 53.2 ± 2.2 | 90.1 | 93.5 | 92.4 | 0.962 ± 0.009 |
| SGB | 91.3 | 93.0 | 52.8 ± 2.2 | 90.7 | 92.0 | 94.0 | 0.968 ± 0.012 |
| LRC | 90.8 | 91.0 | 52.6 ± 2.1 | 89.2 | 97.4 | 94.8 | 0.948 ± 0.014 |
| DNN | 91.6 | 90.6 | 53.0 ± 1.8 | 87.6 | 90.2 | 91.1 | 0.970 ± 0.002 |
Test set (number): activator (262), control (249); * AUC = Area under the receiver operating characteristics curve. ** N = 100 permutations.
Figure 3Feature importance (standard deviation) of the first 10 features for random forest classification; nAcid = number of acidic groups; ALogP = Ghose-Crippen LogKow; ALogP2 = square of ALogP; AMR = molar refractivity; apol = sum of the atomic polarizabilities (including implicit hydrogens); naAromAtom = number of aromatic atoms; nAromBond = number of aromatic bonds; nAtom = number of atoms; nHeavyAtom = number of heavy atoms; nH = number of hydrogen atoms.
Figure 4Receiver operating characteristic (ROC) of the investigated methods. (a) Random Forest classifier, (b) Support Vector Machine classifier, (c) Stochastic Gradient Boosting classifier, (d) Logistic Regression classifier, and (e) Deep Neural Network classifier.
Figure 5Activation pathways of AMPK.