Literature DB >> 31032614

Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment.

Nicolae C Iovanac1, Brett M Savoie1.   

Abstract

Modern machine learning provides promising methods for accelerating the discovery and characterization of novel chemical species. However, in many areas experimental data remain costly and scarce, and computational models are unavailable for targeted figures of merit. Here we report a promising pathway to address this challenge by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks to enable improved prediction in data-scarce applications. The approach is demonstrated for p Ka prediction of moderately sized molecular species using a combination of experimentally available p Ka data and density functional theory-based characterizations of the (de)protonation free energy. A novel autoencoder framework is used to create a continuous chemical latent space that is then used in single and joint training tasks for property prediction. By combining these two data sets in a jointly trained autoencoder framework, we observe mutual improvement in property prediction tasks in the scarce data limit. We also demonstrate an enrichment mechanism that is unique to latent space training, whereby training on excess computational data can mitigate the prediction losses associated with scarce experimental data and advantageously organize the latent space. These results demonstrate that disparate chemical data sources can be advantageously combined in an autoencoder framework with potential general application to data-scarce chemical learning tasks.

Year:  2019        PMID: 31032614     DOI: 10.1021/acs.jpca.9b01398

Source DB:  PubMed          Journal:  J Phys Chem A        ISSN: 1089-5639            Impact factor:   2.781


  4 in total

1.  Targeted sequence design within the coarse-grained polymer genome.

Authors:  Michael A Webb; Nicholas E Jackson; Phwey S Gil; Juan J de Pablo
Journal:  Sci Adv       Date:  2020-10-21       Impact factor: 14.136

2.  CRNNTL: Convolutional Recurrent Neural Network and Transfer Learning for QSAR Modeling in Organic Drug and Material Discovery.

Authors:  Yaqin Li; Yongjin Xu; Yi Yu
Journal:  Molecules       Date:  2021-11-30       Impact factor: 4.411

Review 3.  Automation and data-driven design of polymer therapeutics.

Authors:  Rahul Upadhya; Shashank Kosuri; Matthew Tamasi; Travis A Meyer; Supriya Atta; Michael A Webb; Adam J Gormley
Journal:  Adv Drug Deliv Rev       Date:  2020-11-24       Impact factor: 15.470

4.  A quantitative uncertainty metric controls error in neural network-driven chemical discovery.

Authors:  Jon Paul Janet; Chenru Duan; Tzuhsiung Yang; Aditya Nandy; Heather J Kulik
Journal:  Chem Sci       Date:  2019-07-11       Impact factor: 9.825

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.