Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Different molecular enumeration influences in deep learning: an example using aqueous solubility.

Literature DB >> 32501508

Different molecular enumeration influences in deep learning: an example using aqueous solubility.

Abstract

Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.

Keywords: biological sciences; cheminformatics; drug discovery; medicinal chemistry

Year: 2021 PMID： 32501508 DOI： 10.1093/bib/bbaa092

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Keyword Cloud
Cited

2 in total

1. A general optimization protocol for molecular property prediction using a deep learning network.

Authors: Jen-Hao Chen; Yufeng Jane Tseng
Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622

2. Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning.

Authors: Liangxu Xie; Lei Xu; Ren Kong; Shan Chang; Xiaojun Xu
Journal: Front Pharmacol Date: 2020-12-18 Impact factor: 5.810

2 in total