| Literature DB >> 35799215 |
Xiaochu Tong1,2, Dingyan Wang1,2, Xiaoyu Ding1,2, Xiaoqin Tan1,2, Qun Ren1,3, Geng Chen1,2,4, Yu Rong5, Tingyang Xu5, Junzhou Huang5, Hualiang Jiang1,2, Mingyue Zheng6,7, Xutong Li8,9.
Abstract
Blood-brain barrier is a pivotal factor to be considered in the process of central nervous system (CNS) drug development, and it is of great significance to rapidly explore the blood-brain barrier permeability (BBBp) of compounds in silico in early drug discovery process. Here, we focus on whether and how uncertainty estimation methods improve in silico BBBp models. We briefly surveyed the current state of in silico BBBp prediction and uncertainty estimation methods of deep learning models, and curated an independent dataset to determine the reliability of the state-of-the-art algorithms. The results exhibit that, despite the comparable performance on BBBp prediction between graph neural networks-based deep learning models and conventional physicochemical-based machine learning models, the GROVER-BBBp model shows greatly improvement when using uncertainty estimations. In particular, the strategy combined Entropy and MC-dropout can increase the accuracy of distinguishing BBB + from BBB - to above 99% by extracting predictions with high confidence level (uncertainty score < 0.1). Case studies on preclinical/clinical drugs for Alzheimer' s disease and marketed antitumor drugs that verified by literature proved the application value of uncertainty estimation enhanced BBBp prediction model, that may facilitate the drug discovery in the field of CNS diseases and metastatic brain tumors.Entities:
Keywords: BBBp prediction; Blood–brain barrier penetration; Uncertainty estimation
Year: 2022 PMID: 35799215 PMCID: PMC9264551 DOI: 10.1186/s13321-022-00619-2
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 8.489
Fig. 1Analyzing molecules’ defects in M-data and the distribution of chemical space of S-data and M-data. a A list of defective molecules in M-data. b The distribution of max similarities inside M-data (blue) and max similarity of each molecule in S-data relative to M-data based on ECFP4 (red). c t-SNE distribution of M-data and S-data based on ECFP4
Fig. 2Prediction performance on S-data by BBBp prediction models. Each histogram with an error bar indicates the mean and variance of 5 runs of the model, respectively. Statistical t-tests were applied between the model with the highest metric score and others, and statistically significant test results were noted (*p < 0.05)
Confusion matrix of model predictions on 27 substrates in S-data
| RF(PCP) | MLP(PCP) | Attentive FP | GROVER | |
|---|---|---|---|---|
| TP | 10 | 11 | 11 | 10 |
| FN | 3 | 2 | 2 | 3 |
| FP | 1 | 2 | 3 | 1 |
| TN | 13 | 12 | 11 | 13 |
Fig. 3Prediction performance by introducing different uncertainty estimation methods for BBBp prediction models. a The MCC curves for different uncertainty estimation methods in GROVER, namely Entropy, MC-dropout, Multi-initial, FPsDist, LatentDist and random method. The x-axis is the proportion of remaining compounds in S-data when the compounds with high uncertainty are sequentially discarded, and y-axis is corresponding MCC of the BBBp prediction model. The MCC_AUC is shown in parentheses. b The MCC curves for different uncertainty estimation methods in Attentive FP. c The MCC curves for different uncertainty estimation methods in MLP(PCP). d The MCC curves for different uncertainty estimation methods in RF(PCP)
Fig. 4Prediction results from GROVER-BBBp model on S-data within different uncertainty ranges, and corresponding numbers of molecules. a Entropy method. b MC-dropout method. c Multi-initial method. d FPsDist method. e LatentDist method. f Random method
Model performance of various combinations of uncertainty estimation methods in GROVER-BBBp model
| Entropy | MC-dropout | Multi-initial | FPsDist | LatentDist | MCC |
|---|---|---|---|---|---|
| √ | 0.7938 | ||||
| √ | 0.7764 | ||||
| √ | 0.7737 | ||||
| √ | 0.7008 | ||||
| √ | 0.5383 | ||||
| √ | √ | ||||
| √ | √ | √ | 0.7879 | ||
| √ | √ | √ | 0.7956 | ||
| √ | √ | √ | 0.7771 |
The highest value is highlighted in bold
Fig. 5Prediction performance by introducing ensemble uncertainty and t-SNE distribution for molecules in M-data and S-data. a The MCC curves of Entropy, MC-dropout and ensemble of them. b Prediction results of molecules in S-data within different range of the ensemble uncertainty, and corresponding numbers of molecules. c t-SNE distribution of M-data based on latent representation of GROVER. d t-SNE distribution of S-data based on latent representation of GROVER, and the size of the point represents the uncertainty of the prediction. The larger the size of the point, the smaller the uncertainty value
A list of prediction results with uncertainty of clinical drugs and marketed antitumor drugs
| Drug | Structure* | Predicted probability | Uncertainty | Potential indications |
|---|---|---|---|---|
| FPS-ZM1 |
| 0.9761 | 0.1964 | Alzheimer’s disease |
| Tarenflurbil |
| 0.7881 | 0.6328 | Alzheimer’s disease |
| Niraparib |
| 0.9563 | 0.3676 | Carcinoma ovarian, fallopian tube cancer and peritoneal carcinoma |
| Alectinib |
| 0.9484 | 0.3907 | Non-small cell lung cancer (NSCLC) metastatic, ALK-positive |
| Encorafenib |
| 0.2529 | 0.7017 | Melanoma with BRAF mutation, colorectal cancer with BRAF V600 mutation |
| Osimertinib |
| 0.4864 | 0.7486 | NSCLC advanced, metastatic, EGFR mutation |
*Structures of drugs used in model are stripped of chirality