Literature DB >> 30912940

Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models.

Wenze Li1, Wei Miao1, Jingxia Cui2, Chao Fang1, Shunting Su1, Hongzhi Li1, LiHong Hu1, Yinghua Lu1,2, GuanHua Chen3.   

Abstract

Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases.

Year:  2019        PMID: 30912940     DOI: 10.1021/acs.jcim.8b00878

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  4 in total

1.  LogP prediction performance with the SMD solvation model and the M06 density functional family for SAMPL6 blind prediction challenge molecules.

Authors:  Davy Guan; Raymond Lui; Slade Matthews
Journal:  J Comput Aided Mol Des       Date:  2020-01-14       Impact factor: 3.686

2.  Comprehensive Study of the Chemistry behind the Stability of Carboxylic SWCNT Dispersions in the Development of a Transparent Electrode.

Authors:  Jovana Stanojev; Stevan Armaković; Sara Joksović; Branimir Bajac; Jovan Matović; Vladimir V Srdić
Journal:  Nanomaterials (Basel)       Date:  2022-06-01       Impact factor: 5.719

3.  STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products.

Authors:  Nicholas T Cockroft; Xiaolin Cheng; James R Fuchs
Journal:  J Chem Inf Model       Date:  2019-10-24       Impact factor: 4.956

4.  A protocol for investigating lipidomic dysregulation and discovering lipid biomarkers from human serums.

Authors:  Moran Chen; Yanhong Hao; Suming Chen
Journal:  STAR Protoc       Date:  2022-02-02
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.