Literature DB >> 15921448

General melting point prediction based on a diverse compound data set and artificial neural networks.

M Karthikeyan1, Robert C Glen, Andreas Bender.   

Abstract

We report the development of a robust and general model for the prediction of melting points. It is based on a diverse data set of 4173 compounds and employs a large number of 2D and 3D descriptors to capture molecular physicochemical and other graph-based properties. Dimensionality reduction is performed by principal component analysis, while a fully connected feed-forward back-propagation artificial neural network is employed for model generation. The melting point is a fundamental physicochemical property of a molecule that is controlled by both single-molecule properties and intermolecular interactions due to packing in the solid state. Thus, it is difficult to predict, and previously only melting point models for clearly defined and smaller compound sets have been developed. Here we derive the first general model that covers a comparatively large and relevant part of organic chemical space. The final model is based on 2D descriptors, which are found to contain more relevant information than the 3D descriptors calculated. Internal random validation of the model achieves a correlation coefficient of R(2) = 0.661 with an average absolute error of 37.6 degrees C. The model is internally consistent with a correlation coefficient of the test set of Q(2) = 0.658 (average absolute error 38.2 degrees C) and a correlation coefficient of the internal validation set of Q(2) = 0.645 (average absolute error 39.8 degrees C). Additional validation was performed on an external drug data set consisting of 277 compounds. On this external data set a correlation coefficient of Q(2) = 0.662 (average absolute error 32.6 degrees C) was achieved, showing ability of the model to generalize. Compared to an earlier model for the prediction of melting points of druglike compounds our model exhibits slightly improved performance, despite the much larger chemical space covered. The remaining model error is due to molecular properties that are not captured using single-molecule based descriptors, namely both inter- and intramolecular interactions and crystal packing, for which examples of and reasons for outliers are given.

Year:  2005        PMID: 15921448     DOI: 10.1021/ci0500132

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  10 in total

1.  Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features.

Authors:  Dongsheng Cao; Yizeng Liang; Qingsong Xu; Yifeng Yun; Hongdong Li
Journal:  J Comput Aided Mol Des       Date:  2010-11-13       Impact factor: 3.686

Review 2.  Recent progress in the computational prediction of aqueous solubility and absorption.

Authors:  Stephen R Johnson; Weifan Zheng
Journal:  AAPS J       Date:  2006-02-03       Impact factor: 4.009

3.  Alpha shapes applied to molecular shape characterization exhibit novel properties compared to established shape descriptors.

Authors:  J Anthony Wilson; Andreas Bender; Taner Kaya; Paul A Clemons
Journal:  J Chem Inf Model       Date:  2009-10       Impact factor: 4.956

4.  Quantitative global studies of reactomes and metabolomes using a vectorial representation of reactions and chemical compounds.

Authors:  Juan C Triviño; Florencio Pazos
Journal:  BMC Syst Biol       Date:  2010-04-20

Review 5.  Ab Initio Machine Learning in Chemical Compound Space.

Authors:  Bing Huang; O Anatole von Lilienfeld
Journal:  Chem Rev       Date:  2021-08-13       Impact factor: 60.622

6.  Cross-validation pitfalls when selecting and assessing regression and classification models.

Authors:  Damjan Krstajic; Ljubomir J Buturovic; David E Leahy; Simon Thomas
Journal:  J Cheminform       Date:  2014-03-29       Impact factor: 5.514

7.  Matched Molecular Pair Analysis on Large Melting Point Datasets: A Big Data Perspective.

Authors:  Michael Withnall; Hongming Chen; Igor V Tetko
Journal:  ChemMedChem       Date:  2017-08-23       Impact factor: 3.466

8.  A deep neural network model for packing density predictions and its application in the study of 1.5 million organic molecules.

Authors:  Mohammad Atif Faiz Afzal; Aditya Sonpal; Mojtaba Haghighatlari; Andrew J Schultz; Johannes Hachmann
Journal:  Chem Sci       Date:  2019-07-09       Impact factor: 9.825

9.  Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction.

Authors:  Noel M O'Boyle; David S Palmer; Florian Nigsch; John Bo Mitchell
Journal:  Chem Cent J       Date:  2008-10-29       Impact factor: 4.215

10.  How accurately can we predict the melting points of drug-like compounds?

Authors:  Igor V Tetko; Yurii Sushko; Sergii Novotarskyi; Luc Patiny; Ivan Kondratov; Alexander E Petrenko; Larisa Charochkina; Abdullah M Asiri
Journal:  J Chem Inf Model       Date:  2014-12-09       Impact factor: 4.956

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.