Literature DB >> 17125183

Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization.

Florian Nigsch1, Andreas Bender, Bernd van Buuren, Jos Tissen, Eduard Nigsch, John B O Mitchell.   

Abstract

We have applied the k-nearest neighbor (kNN) modeling technique to the prediction of melting points. A data set of 4119 diverse organic molecules (data set 1) and an additional set of 277 drugs (data set 2) were used to compare performance in different regions of chemical space, and we investigated the influence of the number of nearest neighbors using different types of molecular descriptors. To compute the prediction on the basis of the melting temperatures of the nearest neighbors, we used four different methods (arithmetic and geometric average, inverse distance weighting, and exponential weighting), of which the exponential weighting scheme yielded the best results. We assessed our model via a 25-fold Monte Carlo cross-validation (with approximately 30% of the total data as a test set) and optimized it using a genetic algorithm. Predictions for drugs based on drugs (separate training and test sets each taken from data set 2) were found to be considerably better [root-mean-squared error (RMSE)=46.3 degrees C, r2=0.30] than those based on nondrugs (prediction of data set 2 based on the training set from data set 1, RMSE=50.3 degrees C, r2=0.20). The optimized model yields an average RMSE as low as 46.2 degrees C (r2=0.49) for data set 1, and an average RMSE of 42.2 degrees C (r2=0.42) for data set 2. It is shown that the kNN method inherently introduces a systematic error in melting point prediction. Much of the remaining error can be attributed to the lack of information about interactions in the liquid state, which are not well-captured by molecular descriptors.

Mesh:

Year:  2006        PMID: 17125183     DOI: 10.1021/ci060149f

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  14 in total

1.  Comparative Chemometric Analysis for Classification of Acids and Bases via a Colorimetric Sensor Array.

Authors:  Michael J Kangas; Raychelle M Burks; Jordyn Atwater; Rachel M Lukowicz; Billy Garver; Andrea E Holmes
Journal:  J Chemom       Date:  2017-10-13       Impact factor: 2.467

2.  An Improved Comparison of Chemometric Analyses for the Identification of Acids and Bases With Colorimetric Sensor Arrays.

Authors:  Michael James Kangas; Christina L Wilson; Raychelle M Burks; Jordyn Atwater; Rachel M Lukowicz; Billy Garver; Miles Mayer; Shana Havenridge; Andrea E Holmes
Journal:  Int J Chem       Date:  2018-04-25

3.  DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.

Authors:  Yan Zhang; Zhiwen Jiang; Cheng Chen; Qinqin Wei; Haiming Gu; Bin Yu
Journal:  Interdiscip Sci       Date:  2021-11-03       Impact factor: 2.233

4.  Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques.

Authors:  Sumeet Patiyal; Anjali Dhall; Gajendra P S Raghava
Journal:  Biol Methods Protoc       Date:  2022-05-27

5.  Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions.

Authors:  Faizan Sahigara; Davide Ballabio; Roberto Todeschini; Viviana Consonni
Journal:  J Cheminform       Date:  2013-05-30       Impact factor: 5.514

6.  Early Detection of Tomato Spotted Wilt Virus by Hyperspectral Imaging and Outlier Removal Auxiliary Classifier Generative Adversarial Nets (OR-AC-GAN).

Authors:  Dongyi Wang; Robert Vinson; Maxwell Holmes; Gary Seibel; Avital Bechar; Shimon Nof; Yang Tao
Journal:  Sci Rep       Date:  2019-03-13       Impact factor: 4.379

7.  Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction.

Authors:  Noel M O'Boyle; David S Palmer; Florian Nigsch; John Bo Mitchell
Journal:  Chem Cent J       Date:  2008-10-29       Impact factor: 4.215

8.  Machine learning methods in chemoinformatics.

Authors:  John B O Mitchell
Journal:  Wiley Interdiscip Rev Comput Mol Sci       Date:  2014-09-01

9.  The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS.

Authors:  Igor V Tetko; Daniel M Lowe; Antony J Williams
Journal:  J Cheminform       Date:  2016-01-22       Impact factor: 5.514

10.  How accurately can we predict the melting points of drug-like compounds?

Authors:  Igor V Tetko; Yurii Sushko; Sergii Novotarskyi; Luc Patiny; Ivan Kondratov; Alexander E Petrenko; Larisa Charochkina; Abdullah M Asiri
Journal:  J Chem Inf Model       Date:  2014-12-09       Impact factor: 4.956

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.