| Literature DB >> 35958238 |
Feng Xiong1, Mingao Yu2, Honggui Xu2, Zhenmin Zhong1, Zhenwei Li1, Yuhan Guo2, Tianyuan Zhang2, Zhixuan Zeng1, Feng Jin2, Xun He1.
Abstract
Drug discovery has entered a new period of vigorous development with advanced technologies such as DNA-encoded library (DEL) and artificial intelligence (AI). The previous DEL-AI combination has been successfully applied in the drug discovery of classical kinase and receptor targets mainly based on the known scaffold. So far, there is no report of the DEL-AI combination on inhibitors targeting protein-protein interaction, including those undruggable targets with few or unknown active scaffolds. Here, we applied DEL technology on the T cell immunoglobulin and ITIM domain (TIGIT) target, resulting in the unique hit compound 1 (IC50 = 20.7 μM). Based on the screening data from DEL and hit derivatives a1-a34, a machine learning (ML) modeling process was established to address the challenge of poor sample distribution uniformity, which is also frequently encountered in DEL screening on new targets. In the end, the established ML model achieved a satisfactory hit rate of about 75% for derivatives in a high-scored area.Entities:
Keywords: DNA-encoded library; TIGIT; anti-tumor; machine learning; protein-protein interaction
Year: 2022 PMID: 35958238 PMCID: PMC9360614 DOI: 10.3389/fchem.2022.982539
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.545
FIGURE 1Workflow for DEL screening using immobilized proteins.
SCHEME 1The structure and corresponding protein-protein blocking activity for TIGIT/CD155 complex (IC50/µM) of compound 1 and its derivatives a1-a34. Derivatives a1-a23 were single-site substituted (R1, R2, and R3, respectively); Derivatives a24-a30 were multi-site substituted (R1 -R3); a31-a34 were derivatives with modifications including cyclization on the scaffold amine group (R4) and ortho carbons group (R5).
FIGURE 2Discovery of novel TIGIT ligands by DEL screening. (A) Two-dimensional display of post-selection DNA sequencing data from TIGIT selection. x-axis: post-selection sequence counts; y-axis: post-selection enrichment fold = (post-selection counts %)/(pre-selection counts %). (B) Structure of hit 1. (C) Blocking effects on TIGIT/CD155 complex in HTRF assay for hit 1.
The performance of MLP with representative undersampling interval N and repeated sampling multiples (complete data were provided in supporting information).
| Model | Oversample_multiple | N | Train_mse | Valid1_mse | Valid2_mse | Valid2_ratio | Test1_mse | Test2_mse | Test2_ratio |
|---|---|---|---|---|---|---|---|---|---|
| MLP | 400 | 4 | 0.0038 | 0.0039 | 0.153 | 0.33 | 0.0040 | 0.104 | 0.17 |
| MLP | 400 | 6 | 0.0037 | 0.0041 | 0.137 | 0.67 | 0.0041 | 0.101 | 0.50 |
| MLP | 400 | 8 | 0.0038 | 0.0045 | 0.150 | 0.50 | 0.0045 | 0.111 | 0.67 |
| MLP | 600 | 2 | 0.0038 | 0.0038 | 0.147 | 0.17 | 0.0038 | 0.100 | 0.33 |
| MLP | 600 | 4 | 0.0037 | 0.004 | 0.148 | 0.50 | 0.0041 | 0.104 | 0.67 |
| MLP | 600 | 6 | 0.0037 | 0.0043 | 0.142 | 0.67 | 0.0043 | 0.098 | 0.50 |
The performance of lightGBM with representative undersampling interval N and repeated sampling multiples (complete data were provided in supporting information).
| Model | Oversample_multiple | N | Train_mse | Valid1_mse | Valid2_mse | Valid2_ratio | Test1_mse | Test2_mse | Test2_ratio |
|---|---|---|---|---|---|---|---|---|---|
| lightGBM | 600 | 4 | 0.0074 | 0.0036 | 0.172 | 0.50 | 0.0046 | 0.162 | 0.67 |
| lightGBM | 600 | 6 | 0.0027 | 0.0030 | 0.167 | 0.50 | 0.0029 | 0.191 | 0.33 |
| lightGBM | 600 | 8 | 0.0036 | 0.0045 | 0.170 | 0.33 | 0.0034 | 0.134 | 0.67 |
| lightGBM | 800 | 2 | 0.0064 | 0.0039 | 0.178 | 0.50 | 0.0045 | 0.170 | 0.50 |
| lightGBM | 800 | 4 | 0.0036 | 0.0032 | 0.131 | 0.67 | 0.0043 | 0.168 | 0.33 |
| lightGBM | 800 | 6 | 0.0086 | 0.0054 | 0.172 | 0.50 | 0.0054 | 0.172 | 0.33 |
| lightGBM | 800 | 8 | 0.0032 | 0.0045 | 0.138 | 0.67 | 0.0033 | 0.138 | 0.50 |
The average result obtained by randomly setting the value of bit 0 on 1–4 molecular fingerprints to 1 and repeating ten times.
| Model | Modify_bit | Modify_type | Train_mse | Valid1_mse | Valid2_mse | Valid2_ratio | Test1_mse | Test2_mse | Test2_ratio |
|---|---|---|---|---|---|---|---|---|---|
| MLP | 1 | 1 | 0.0049 | 0.0043 | 0.157 | 0.32 | 0.0041 | 0.143 | 0.37 |
| MLP | 1 | 2 | 0.0052 | 0.0045 | 0.158 | 0.36 | 0.0043 | 0.152 | 0.33 |
| MLP | 2 | 1 | 0.0051 | 0.0043 | 0.164 | 0.41 | 0.0041 | 0.160 | 0.40 |
| MLP | 2 | 2 | 0.0052 | 0.0043 | 0.161 | 0.40 | 0.0041 | 0.161 | 0.35 |
| MLP | 3 | 1 | 0.0051 | 0.0042 | 0.166 | 0.28 | 0.0040 | 0.163 | 0.35 |
| MLP | 3 | 2 | 0.0055 | 0.0045 | 0.163 | 0.38 | 0.0043 | 0.167 | 0.38 |
| MLP | 4 | 1 | 0.0052 | 0.0041 | 0.171 | 0.28 | 0.0040 | 0.165 | 0.45 |
| MLP | 4 | 2 | 0.0056 | 0.0042 | 0.166 | 0.37 | 0.0040 | 0.171 | 0.28 |
| lightGBM | 1 | 1 | 0.0047 | 0.0046 | 0.140 | 0.50 | 0.0043 | 0.128 | 0.62 |
| lightGBM | 1 | 2 | 0.0058 | 0.0049 | 0.189 | 0.48 | 0.0044 | 0.237 | 0.39 |
| lightGBM | 2 | 1 | 0.0045 | 0.0046 | 0.138 | 0.46 | 0.0042 | 0.128 | 0.62 |
| lightGBM | 2 | 2 | 0.0067 | 0.0054 | 0.176 | 0.48 | 0.0050 | 0.227 | 0.35 |
| lightGBM | 3 | 1 | 0.0042 | 0.0044 | 0.137 | 0.50 | 0.0041 | 0.127 | 0.67 |
| lightGBM | 3 | 2 | 0.0055 | 0.0049 | 0.163 | 0.55 | 0.0045 | 0.222 | 0.35 |
| lightGBM | 4 | 1 | 0.0042 | 0.0044 | 0.136 | 0.50 | 0.0041 | 0.128 | 0.67 |
| lightGBM | 4 | 2 | 0.0047 | 0.0045 | 0.163 | 0.52 | 0.0042 | 0.216 | 0.37 |
Modify_bit is to modify the number of digits of the fingerprint randomly, modify_type = 1 is the performance when the bit of the fingerprint is set to 0 and 1, and modify_type = 2 is the reverse performance of setting.
Model performance without additional positive sample.
| Model | Train_mse | Valid1_mse | Valid2_mse | Test1_mse | Test2_mse | Valid2_ratio | Test2_ratio |
|---|---|---|---|---|---|---|---|
| MLP | 0.0026 | 0.0039 | 0.163 | 0.0039 | 0.189 | 0.33 | 0.33 |
| lightGBM | 0.0021 | 0.0035 | 0.230 | 0.0035 | 0.225 | 0.33 | 0.33 |
Model performance with additional positive sample a6.
| Model | Train_mse | Valid1_mse | Valid2_mse | Test1_mse | Test2_mse | Valid2_ratio | Test2_ratio |
|---|---|---|---|---|---|---|---|
| MLP | 0.0037 | 0.0041 | 0.137 | 0.0040 | 0.102 | 0.67 | 0.5 |
| lightGBM | 0.0087 | 0.0067 | 0.132 | 0.0067 | 0.145 | 0.67 | 0.5 |
FIGURE 3The relationship between lightGBM (A) and MLP (B) prediction score and IC50 value.
FIGURE 4Gaussian distribution of compounds with high predicted scores.