| Literature DB >> 35592264 |
Rajnish Kumar1, Anju Sharma2, Athanasios Alexiou3,4, Anwar L Bilgrami5,6, Mohammad Amjad Kamal7,8,9,10,11, Ghulam Md Ashraf12,13.
Abstract
The blood-brain barrier (BBB) is a selective and semipermeable boundary that maintains homeostasis inside the central nervous system (CNS). The BBB permeability of compounds is an important consideration during CNS-acting drug development and is difficult to formulate in a succinct manner. Clinical experiments are the most accurate method of measuring BBB permeability. However, they are time taking and labor-intensive. Therefore, numerous efforts have been made to predict the BBB permeability of compounds using computational methods. However, the accuracy of BBB permeability prediction models has always been an issue. To improve the accuracy of the BBB permeability prediction, we applied deep learning and machine learning algorithms to a dataset of 3,605 diverse compounds. Each compound was encoded with 1,917 features containing 1,444 physicochemical (1D and 2D) properties, 166 molecular access system fingerprints (MACCS), and 307 substructure fingerprints. The prediction performance metrics of the developed models were compared and analyzed. The prediction accuracy of the deep neural network (DNN), one-dimensional convolutional neural network, and convolutional neural network by transfer learning was found to be 98.07, 97.44, and 97.61%, respectively. The best performing DNN-based model was selected for the development of the "DeePred-BBB" model, which can predict the BBB permeability of compounds using their simplified molecular input line entry system (SMILES) notations. It could be useful in the screening of compounds based on their BBB permeability at the preliminary stages of drug development. The DeePred-BBB is made available at https://github.com/12rajnish/DeePred-BBB.Entities:
Keywords: CNS-permeability; blood-brain barrier; convolutional neural network; deep learning; machine learning; prediction
Year: 2022 PMID: 35592264 PMCID: PMC9112838 DOI: 10.3389/fnins.2022.858126
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
FIGURE 1The BBB and its permeability mechanism (Saxena et al., 2021).
The final dataset and its distribution.
| Dataset | BBB permeable compounds | BBB non-permeable compounds | Total |
|
| 819 | 366 | 1,185 |
|
| 1,398 | 393 | 1,791 |
|
| 390 | 239 | 629 |
| Total | 2,607 | 998 | 3,605 |
Distribution of the dataset in the training and test sets.
| Dataset | BBB permeable compounds | BBB non-permeable compounds | Total |
| Training set | 1,955 | 749 | 2,704 |
| Test set | 652 | 249 | 901 |
| Total | 2,607 | 998 | 3,605 |
Hyperparameter values explored for the DNN model.
| Parameter | Values |
| Number of hidden layers | 1–5 |
| Number of neurons | 100, 200, 300, 500, 800 |
| Dropout ratio | 0.1, 0.2, 0.3, 0.4, 0.5 |
| Learning rate | 0.0001, 0.0002, 0.0003, 0.001, 0.002, 0.003 |
| Epochs | 100, 200, 400, 500, 800 |
Explored hyperparameter values for the CNN-1D model.
| Parameter | Values |
| Number of filters | 15, 32, 64 |
| Number of dense layers | 1, 2 |
| Dropout ratio | 0.2, 0.3, 0.4 |
| Learning rate | 0.0001, 0.0002, 0.0003, 0.001, 0.002, 0.003 |
| Epochs | 100, 200, 400, 500, 600 |
The hyperparameters for CNN (VGG16).
| Parameters | VGG16 |
| Convolutional blocks | Convolutional layers, Kernel size, Filters, Max-Pooling, Zero Padding: Predefined |
| Dense layers | 02 |
| Dense layers neurons | 150, 104 |
| Dropout ratio | 0.5 |
| Learning rate | 0.02 |
| Batch size | 132 |
| Epochs | 800 |
FIGURE 2Methodology adopted for predicting the BBB permeability of compounds using SMILES notation. The SMILES notations were used to calculate molecular properties and fingerprints using PaDel. These features were used as input to DNN and CNN-1D to generate the BBB permeability prediction models. The 2D images of compounds were generated using RDkit and fed to the CNN (VGG16) to generate the BBB permeability prediction model.
Performance metrics of ML and DL algorithms.
| Algorithm | AUC | AUPRC | AP | F1 | A (%) | HD | FPR (%) | FNR (%) |
| SVM (RBF) | 0.964 | 0.988 | 0.975 | 0.985 | 96.29 | 0.022 | 6.451 | 0.724 |
| SVM (Polynomial | 0.948 | 0.965 | 0.965 | 0.98 | 96.01 | 0.029 | 9.756 | 0.579 |
| SVM (Sigmoid) | 0.921 | 0.971 | 0.944 | 0.962 | 94.45 | 0.055 | 13.359 | 2.519 |
| SVM (Linear) | 0.916 | 0.969 | 0.938 | 0.963 | 94.56 | 0.054 | 15.242 | 1.497 |
| NB | 0.844 | 0.948 | 0.899 | 0.935 | 90.18 | 0.098 | 14.543 | 3.202 |
| kNN (3) | 0.927 | 0.974 | 0.949 | 0.968 | 95.3 | 0.047 | 12.891 | 1.615 |
| RF (3, 20) | 0.815 | 0.943 | 0.887 | 0.938 | 90.29 | 0.0971 | 26.666 | 1.471 |
|
|
|
|
|
|
|
|
|
|
| CNN-1D | 0.969 | 0.956 | 0.975 | 0.983 | 97.44 | 0.026 | 4.118 | 2.017 |
| CNN (VGG16) | 0.972 | 0.983 | 0.983 | 0.946 | 97.61 | 0.0804 | 4.581 | 2.326 |
AUC, area under curve; AUPRC, area under precision-recall curve; AP, average precision; F1, F1 score; A, accuracy; HD, hamming distance; FPR, false positive rate; FNR, false negative rate; SVM, support vector machine; RBF, radial basis function; d, degree; NB, naïve Bayes; kNN, k-nearest neighbor; RF, random forest; DNN, deep neural network; CNN-1D, convolution neural network-one dimension; CNN (VGG16), convolution neural network- visual geometry group16. Best performing model (highlighted in bold).
FIGURE 3ROCs of the best performing models (CNN, SVM, DNN, and CNN-1D) and their respective AUCs.
FIGURE 4Accuracy and loss plot of the DNN model for the prediction of BBB permeability.
Comparative analysis of DeePred-BBB with recently published BBB permeability prediction models.
| Algorithm | Data set | Prediction performance | Study group |
| SVM (RBF) | 1,562 compounds (BBB+ = 694, BBB- = 868) | >85% accuracy, sensitivity, and specificity |
|
| Decision trees | 581 compounds | Accuracy = 87.93%, Sensitivity = 86.67% Specificity = 89.29% |
|
| SVM (RBF) | 1,990 compounds (BBB permeable = 1,550, BBB non-permeable = 440) | Accuracy = 93.96%, Sensitivity = 94.3%, Specificity = 91.0%, MCC = 0.84 |
|
| SVM, kNN | 2,358 compounds | Accuracy = 96.6%, Sensitivity = 92.5%, Specificity = 89.9% | |
| DL | 462 compounds (BBB permeable = 250, BBB non-permeable = 212) | Accuracy = 97%, AUC = 0.98, F1 = 0.92 |
|
| RNN | 2,342 compounds | Accuracy = 96.53%, Sensitivity = 94.91%, Specificity = 98.09%, MCC = 0.931, AUC = 0.986 |
|
| Light Gradient Boosting Machine Algorithm | 7,162 compounds (BBB permeable = 5,453 BBB non-permeable = 1,709) | Accuracy = 90%, sensitivity = 85%, specificity = 94% |
|
| RF, Multilayer perceptron, Sequential minimal optimization | 605 compounds (training) +1,566 compounds (validation) | Accuracy = 86.5% |
|
| SVM (RBF) | 1,978 compounds (BBB permeable = 1,550, BBB non-permeable = 440) | Accuracy = 96.77%, AUC = 0.964, F1 = 0.975 |
|
| DNN | 3,605 compounds (BBB permeable = 2,704 BBB non-permeable = 901) | Accuracy = 98.07%, AUC = 0.992, AP = 0.997, F1 = 0.987 | Current study |
AUC, area under curve; AUPRC, area under precision-recall curve; AP, average precision; BBB, blood brain barrier; DL, deep learning; F1, F1 score; kNN, k-nearest neighbor; MCC, Matthews correlation coefficient; RBF, radial basis function; RF, random forest; RNN, recurrent neural network; SVM, support vector machine.