Literature DB >> 35535861

Machine Learning on DNA-Encoded Library Count Data Using an Uncertainty-Aware Probabilistic Loss Function.

Katherine S Lim1,2, Andrew G Reidenbach3, Bruce K Hua3,4, Jeremy W Mason3,5, Christopher J Gerry3,4, Paul A Clemons3, Connor W Coley1,3,6.   

Abstract

DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find novel small molecules that bind a protein target. Applying QSAR modeling to DEL selection data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been done recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" in order to accommodate the sparse and noisy nature of DEL data. However, a binary classification model cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules, using a custom negative-log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned structure-activity relationships. Our approach explicitly models the Poisson statistics of the sequencing process used in the DEL experimental workflow under a frequentist view. We illustrate this approach on a DEL dataset of 108,528 compounds screened against carbonic anhydrase (CAIX), and a dataset of 5,655,000 compounds screened against soluble epoxide hydrolase (sEH) and SIRT2. Due to the treatment of uncertainty in the data through the negative-log-likelihood loss used during training, the models can ignore low-confidence outliers. While our approach does not demonstrate a benefit for extrapolation to novel structures, we expect our denoising and visualization pipeline to be useful in identifying structure-activity trends and highly enriched pharmacophores in DEL data. Further, this approach to uncertainty-aware regression modeling is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35535861     DOI: 10.1021/acs.jcim.2c00041

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   6.162


  2 in total

1.  Cancer Therapeutic Targeting of Hypoxia Induced Carbonic Anhydrase IX: From Bench to Bedside.

Authors:  Paul C McDonald; Shawn C Chafe; Claudiu T Supuran; Shoukat Dedhar
Journal:  Cancers (Basel)       Date:  2022-07-06       Impact factor: 6.575

2.  Discovery of TIGIT inhibitors based on DEL and machine learning.

Authors:  Feng Xiong; Mingao Yu; Honggui Xu; Zhenmin Zhong; Zhenwei Li; Yuhan Guo; Tianyuan Zhang; Zhixuan Zeng; Feng Jin; Xun He
Journal:  Front Chem       Date:  2022-07-26       Impact factor: 5.545

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.