Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Extracting Predictive Representations from Hundreds of Millions of Molecules.

Literature DB >> 34723543

Extracting Predictive Representations from Hundreds of Millions of Molecules.

Dong Chen^1,2, Jiaxin Zheng¹, Guo-Wei Wei^2,3,4, Feng Pan¹.

Abstract

The construction of appropriate representations remains essential for molecular predictions due to intricate molecular complexity. Additionally, it is often expensive and ethically constrained to generate labeled data for supervised learning in molecular sciences, leading to challenging small and diverse data sets. In this work, we develop a self-supervised learning approach to pretrain models from over 700 million unlabeled molecules in multiple databases. The intrinsic chemical logic learned from this approach enables the extraction of predictive representations from task-specific molecular sequences in a fine-tuned process. To understand the importance of self-supervised learning from unlabeled molecules, we assemble three models with different combinations of databases. Moreover, we propose a protocol based on data traits to automatically select the optimal model for a specific task. To validate the proposed method, we consider 10 benchmarks and 38 virtual screening data sets. Extensive validation indicates that the proposed method shows superb performance.

Entities: Chemical

Year: 2021 PMID： 34723543 PMCID： PMC9358546 DOI： 10.1021/acs.jpclett.1c03058

Source DB: PubMed Journal: J Phys Chem Lett ISSN： 1948-7185 Impact factor: 6.888

29 in total

1. Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks.

Authors: Sumin Lee; Myeonghun Lee; Ki-Won Gyak; Sung Dug Kim; Mi-Jeong Kim; Kyoungmin Min
Journal: ACS Omega Date: 2022-04-04

1 in total

Extracting Predictive Representations from Hundreds of Millions of Molecules.

1. ZINC--a free database of commercially available compounds for virtual screening.

2. Benchmarking sets for molecular docking.

3. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.

4. Benchmark data set for in silico prediction of Ames mutagenicity.

5. FreeSolv: a database of experimental and calculated hydration free energies, with input files.

Review 6. A review of mathematical representations of biomolecular data.

7. Algebraic graph-assisted bidirectional transformers for molecular property prediction.

8. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties.

9. Open Babel: An open chemical toolbox.

1. Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks.