| Literature DB >> 32673431 |
Qi Yang1, Yao Li1, Jin-Dong Yang1, Yidi Liu1, Long Zhang1, Sanzhong Luo1, Jin-Pei Cheng1.
Abstract
While many approaches to predict aqueous pKa values exist, the fast and accurate prediction of non-aqueous pKa values is still challenging. Based on the iBonD experimental pKa database (39 solvents), a holistic pKa prediction model was established using machine learning. Structural and physical-organic-parameter-based descriptors (SPOC) were introduced to represent the electronic and structural features of the molecules. The models trained with a neural network or the XGBoost algorithm showed the best prediction performance with a low MAE value of 0.87 pKa units. The approach allows a comprehensive mapping of all possible pKa correlations between different solvents and it was validated by predicting the aqueous pKa and micro-pKa of pharmaceutical molecules and pKa values of organocatalysts in DMSO and MeCN with high accuracy. An online prediction platform was constructed based on the current model, which can provide pKa prediction for different types of X-H acidity in the most commonly used solvents.Entities:
Keywords: XGBoost; iBond; machine learning; neural network; organocatalysts; pKa prediction
Year: 2020 PMID: 32673431 DOI: 10.1002/anie.202008528
Source DB: PubMed Journal: Angew Chem Int Ed Engl ISSN: 1433-7851 Impact factor: 15.336