Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Machine Learning on DNA-Encoded Library Count Data Using an Uncertainty-Aware Probabilistic Loss Function.

Literature DB >> 35535861

Machine Learning on DNA-Encoded Library Count Data Using an Uncertainty-Aware Probabilistic Loss Function.

Katherine S Lim^1,2, Andrew G Reidenbach³, Bruce K Hua^3,4, Jeremy W Mason^3,5, Christopher J Gerry^3,4, Paul A Clemons³, Connor W Coley^1,3,6.

Abstract

DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find novel small molecules that bind a protein target. Applying QSAR modeling to DEL selection data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been done recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" in order to accommodate the sparse and noisy nature of DEL data. However, a binary classification model cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules, using a custom negative-log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned structure-activity relationships. Our approach explicitly models the Poisson statistics of the sequencing process used in the DEL experimental workflow under a frequentist view. We illustrate this approach on a DEL dataset of 108,528 compounds screened against carbonic anhydrase (CAIX), and a dataset of 5,655,000 compounds screened against soluble epoxide hydrolase (sEH) and SIRT2. Due to the treatment of uncertainty in the data through the negative-log-likelihood loss used during training, the models can ignore low-confidence outliers. While our approach does not demonstrate a benefit for extrapolation to novel structures, we expect our denoising and visualization pipeline to be useful in identifying structure-activity trends and highly enriched pharmacophores in DEL data. Further, this approach to uncertainty-aware regression modeling is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 35535861 DOI： 10.1021/acs.jcim.2c00041

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 6.162

Keyword Cloud
Cited

2 in total

1. Cancer Therapeutic Targeting of Hypoxia Induced Carbonic Anhydrase IX: From Bench to Bedside.

Authors: Paul C McDonald; Shawn C Chafe; Claudiu T Supuran; Shoukat Dedhar
Journal: Cancers (Basel) Date: 2022-07-06 Impact factor: 6.575

2. Discovery of TIGIT inhibitors based on DEL and machine learning.

Authors: Feng Xiong; Mingao Yu; Honggui Xu; Zhenmin Zhong; Zhenwei Li; Yuhan Guo; Tianyuan Zhang; Zhixuan Zeng; Feng Jin; Xun He
Journal: Front Chem Date: 2022-07-26 Impact factor: 5.545

2 in total