| Literature DB >> 34643666 |
Jiacheng Xiong1,2, Zhaojun Li3, Guangchao Wang4, Zunyun Fu1, Feisheng Zhong1,2, Tingyang Xu5, Xiaomeng Liu1,2, Ziming Huang1,2, Xiaohong Liu1,3,6, Kaixian Chen1,2, Hualiang Jiang1,2,6, Mingyue Zheng1,2.
Abstract
MOTIVATION: The acid dissociation constant (pKa) is a critical parameter to reflect the ionization ability of chemical compounds and is widely applied in a variety of industries. However, the experimental determination of pKa is intricate and time-consuming, especially for the exact determination of micro pKa information at the atomic level. Hence, a fast and accurate prediction of pKa values of chemical compounds is of broad interest.Entities:
Year: 2021 PMID: 34643666 PMCID: PMC8756178 DOI: 10.1093/bioinformatics/btab714
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The relationship between macro- and micro-pKa of basic compounds. pKa(macro) refers to the macro-pKa; pKa1, pKa2 and pKa3 refer to the micro-pKa
Fig. 2.The schematic representation of the proposed Graph-pKa model
Fig. 3.The distributions of simple compound properties in the S-pKa dataset. (a) The experimental pKa values. (b) The number of ionizable sites
Fig. 4.The performance of the various model on macro-pKa prediction on the S-pKa dataset. (a,b) The MAE and R2 of those models on the test dataset. (c,d) The MAE of those models for acidic (c) and basic (d) pKa prediction on a series of similarity subsets. Error bars represent standard deviations
Performance of Graph-pKa and other models on the SAMPL6 and SAMPL7 external test sets
| Dataset | Model name | Model class | MAE | RMSE |
|
|---|---|---|---|---|---|
|
SAMPL6 | Epik Scan | Commercial | 0.784 | 0.962 | 0.857 |
| Epik Micro | Commercial | 0.783 | 0.972 | 0.854 | |
| ACD/pKaa | Commercial |
| 0.783 | 0.905 | |
| MoKa | Commercial | 0.854 | 0.970 | 0.854 | |
| ChemAxon | Commercial | 1.007 | 1.248 | 0.759 | |
| Hunt’s model | Academic | 0.687 | 0.864 | 0.885 | |
| Yang’s XGB | Academic | 0.767 | 1.011 | 0.842 | |
| Yang’s NN | Academic | 0.832 | 1.141 | 0.799 | |
| OPERA | Academic | 0.970 | 1.283 | 0.619 | |
|
Graph-p |
Academic |
0.594 |
|
| |
| SAMPL7 | Epik Scan | Commercial | 1.121 | 1.648 | 0.508 |
| ChemAxon | Commercial |
|
|
| |
| Yang’s XGB | Academic | 1.476 | 1.622 | 0.523 | |
| Yang’s NN | Academic | 0.932 | 1.156 | 0.758 | |
| OPERA | Academic | 2.135 | 2.515 | −3.752 | |
| Graph-p | Academic | 0.758 | 0.934 | 0.839 |
The bold entries in the “MAE”, “RMSE”, and “R2” columns represent the best results in corresponding datasets.
The results are cited from a summary of the SAMPL6 challenge results. (https://github.com/samplchallenges/SAMPL6/blob/master/physical_properties/pKa/analysis/).
The results are cited from articles of Hunt and Yang .
The results of Epik predictions are from Schrödinger Suite 2017; the results of ChemAxon predictions are from ChemAxon Marvin Suite 20.15.0. The results of Yang’s XGB and Yang’s NN are from a webserver (http://pka.luoszgroup.com/prediction).
The results are from OPERA 2.7. Nine pKa values that OPERA2.7 failed to predict were excluded.
Fig. 5.Application of Graph-pKa to predict the dominant ionization sites of molecules. (a) The consistency rates between the prediction of Graph-pKa and the judgment of human experts. (b) The distribution of difference values representing the degree of divergence between Graph-pKa and human experts on controversial molecules. (c,d) Some examples of molecules on which the predictions of Graph-pKa and human experts are consistent (c) and different (d), the arrows and circles denote to the dominant ionization sites selected by Graph-pKa and human experts, respectively, red and blue numbers, respectively, denote to the predicted acidic and basic pKa values of atoms by Graph-pKa. (e) Some molecules and their pKa values for reference
Fig. 6.Visualizing the atomic embeddings in last hidden layer using principal component analysis