Youquan Liu1, Yanzhi Guo2, Wengang Wu3, Ying Xiong3, Chuan Sun3, Li Yuan3, Menglong Li4. 1. Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China. youquan_l@petrochina.com.cn. 2. College of Chemistry, Sichuan University, Chengdu, Sichuan, 610064, People's Republic of China. yzguo@scu.edu.cn. 3. Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China. 4. College of Chemistry, Sichuan University, Chengdu, Sichuan, 610064, People's Republic of China.
Abstract
BACKGROUND: Computational prediction of inhibition efficiency (IE) for inhibitor molecules is a crucial supplementary way to design novel molecules that can efficiently inhibit corrosion onto metallic surfaces. PURPOSE: Here we are dedicated to developing a new machine learning-based predictor for the inhibition efficiency (IE) of benzimidazole derivatives. METHODS: First, a comprehensively numerical representation was given on inhibitor molecules from all aspects of energy, electronic, topological, physicochemical and spatial properties based on 3-D structures and 150 valid structural descriptors were obtained. Then, a thorough investigation of these structural descriptors was implemented. The multicollinearity-based clustering analysis was performed to remove the linear correlated feature variables, so 47 feature clusters were produced. Meanwhile, Gini importance by random forest (RF) was used to further measure the contributions of the descriptors in each cluster and 47 non-linear descriptors were selected with the highest Gini importance score in the corresponding cluster. Further, considering the limited number of available inhibitors, different feature subsets were constructed according to the Gini importance score ranking list of 47 descriptors. RESULTS: Finally, support vector machine (SVM) models based on different feature subsets were tested by leave-one-out cross validation. Through comparisons, the optimal SVM model with the top 11 descriptors was achieved based on Poly kernel. This model yields a promising performance with the correlation coefficient (R) and root-mean-square error (RMSE) of 0.9589 and 4.45, respectively, which indicates that the method proposed by us gives the best performance for the current data. CONCLUSION: Based on our model, 6 new benzimidazole molecules were designed and their IE values predicted by this model indicate that two of them have high potential as outstanding corrosion inhibitors.
BACKGROUND: Computational prediction of inhibition efficiency (IE) for inhibitor molecules is a crucial supplementary way to design novel molecules that can efficiently inhibit corrosion onto metallic surfaces. PURPOSE: Here we are dedicated to developing a new machine learning-based predictor for the inhibition efficiency (IE) of benzimidazole derivatives. METHODS: First, a comprehensively numerical representation was given on inhibitor molecules from all aspects of energy, electronic, topological, physicochemical and spatial properties based on 3-D structures and 150 valid structural descriptors were obtained. Then, a thorough investigation of these structural descriptors was implemented. The multicollinearity-based clustering analysis was performed to remove the linear correlated feature variables, so 47 feature clusters were produced. Meanwhile, Gini importance by random forest (RF) was used to further measure the contributions of the descriptors in each cluster and 47 non-linear descriptors were selected with the highest Gini importance score in the corresponding cluster. Further, considering the limited number of available inhibitors, different feature subsets were constructed according to the Gini importance score ranking list of 47 descriptors. RESULTS: Finally, support vector machine (SVM) models based on different feature subsets were tested by leave-one-out cross validation. Through comparisons, the optimal SVM model with the top 11 descriptors was achieved based on Poly kernel. This model yields a promising performance with the correlation coefficient (R) and root-mean-square error (RMSE) of 0.9589 and 4.45, respectively, which indicates that the method proposed by us gives the best performance for the current data. CONCLUSION: Based on our model, 6 new benzimidazole molecules were designed and their IE values predicted by this model indicate that two of them have high potential as outstanding corrosion inhibitors.
Authors: Taiwo W Quadri; Lukman O Olasunkanmi; Omolola E Fayemi; Hassane Lgaz; Omar Dagdag; El-Sayed M Sherif; Ekemini D Akpan; Han-Seung Lee; Eno E Ebenso Journal: J Mol Model Date: 2022-08-11 Impact factor: 2.172