Literature DB >> 32501508

Different molecular enumeration influences in deep learning: an example using aqueous solubility.

Jen-Hao Chen, Yufeng Jane Tseng.   

Abstract

Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Keywords:  biological sciences; cheminformatics; drug discovery; medicinal chemistry

Year:  2021        PMID: 32501508     DOI: 10.1093/bib/bbaa092

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  2 in total

1.  A general optimization protocol for molecular property prediction using a deep learning network.

Authors:  Jen-Hao Chen; Yufeng Jane Tseng
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

2.  Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning.

Authors:  Liangxu Xie; Lei Xu; Ren Kong; Shan Chang; Xiaojun Xu
Journal:  Front Pharmacol       Date:  2020-12-18       Impact factor: 5.810

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.