| Literature DB >> 35656322 |
Xue Chen1, Qianyue Zhang1, Bowen Li1, Chunying Lu1, Shanshan Yang1, Jinjin Long1, Bifang He1, Heng Chen1, Jian Huang2.
Abstract
Blood-brain barrier (BBB) is a major barrier to drug delivery into the brain in the treatment of central nervous system (CNS) diseases. Blood-brain barrier penetrating peptides (BBPs), a class of peptides that can cross BBB through various mechanisms without damaging BBB, are effective drug candidates for CNS diseases. However, identification of BBPs by experimental methods is time-consuming and laborious. To discover more BBPs as drugs for CNS disease, it is urgent to develop computational methods that can quickly and accurately identify BBPs and non-BBPs. In the present study, we created a training dataset that consists of 326 BBPs derived from previous databases and published manuscripts and 326 non-BBPs collected from UniProt, to construct a BBP predictor based on sequence information. We also constructed an independent testing dataset with 99 BBPs and 99 non-BBPs. Multiple machine learning methods were compared based on the training dataset via a nested cross-validation. The final BBP predictor was constructed based on the training dataset and the results showed that random forest (RF) method outperformed other classification algorithms on the training and independent testing dataset. Compared with previous BBP prediction tools, the RF-based predictor, named BBPpredict, performs considerably better than state-of-the-art BBP predictors. BBPpredict is expected to contribute to the discovery of novel BBPs, or at least can be a useful complement to the existing methods in this area. BBPpredict is freely available at http://i.uestc.edu.cn/BBPpredict/cgi-bin/BBPpredict.pl.Entities:
Keywords: blood-brain barrier; blood-brain barrier penetrating peptides (BBPs); computational method; nested cross-validation; random forest (RF)
Year: 2022 PMID: 35656322 PMCID: PMC9152268 DOI: 10.3389/fgene.2022.845747
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
List of training dataset and independent testing dataset.
| Dataset | Number of BBPs | Number of Non-BBPs |
|---|---|---|
| Training dataset | 326 | 326 |
| Independent testing dataset | 99 | 99 |
FIGURE 1The framework of BBPpredict. (A). Dataset Construction. (B). Feature extraction. (C). Feature selection. (D). Model construction. (E). Model evaluation. (F). Web service.
The prediction performances of different classifiers in nested five-fold cross-validation.
| Scoring Method | Classifier | SN(%) | SP(%) | ACC(%) | MCC | AUC |
|---|---|---|---|---|---|---|
| F-score |
|
|
|
|
|
|
| KNN | 76.69 | 80.98 | 78.83 | 0.5772 | 0.7883 | |
| rbfSVM | 78.83 | 83.13 | 80.98 | 0.6202 | 0.8872 | |
| linearSVM | 75.77 | 83.13 | 79.45 | 0.5906 | 0.8690 | |
| DT | 71.78 | 74.54 | 73.16 | 0.4634 | 0.7357 | |
| LSTM | 65.23 | 75.38 | 70.31 | 0.4083 | 0.7313 | |
| AdaBoost | 77.91 | 80.67 | 79.29 | 0.5861 | 0.8615 | |
| GentleBoost | 77.30 | 80.06 | 78.68 | 0.5738 | 0.8582 | |
| LogitBoost | 79.14 | 82.21 | 80.67 | 0.6138 | 0.8680 |
FIGURE 2Performance evaluation of different predictors in five-fold cross-validation and independent testing dataset. (A) ROC curves of the five-fold cross-validation. (B) ROC curves of the independent testing dataset.
The prediction performances of different classifiers in the independent testing dataset.
| Scoring Method | Classifier | SN(%) | SP(%) | ACC(%) | MCC | AUC |
|---|---|---|---|---|---|---|
| F-score |
|
|
|
|
|
|
| rbfSVM | 78.79 | 73.74 | 76.26 | 0.5259 | 0.8241 | |
| KNN | 70.71 | 66.67 | 68.69 | 0.3740 | 0.6869 | |
| DT | 69.70 | 61.62 | 65.66 | 0.3142 | 0.6574 | |
| linearSVM | 64.65 | 74.75 | 69.70 | 0.3960 | 0.7656 | |
| LSTM | 58.59 | 63.64 | 61.11 | 0.2225 | 0.6041 | |
| AdaBoost | 64.65 | 68.69 | 66.67 | 0.3336 | 0.7389 | |
| GentleBoost | 74.75 | 66.67 | 70.71 | 0.4155 | 0.7831 | |
| LogitBoost | 67.68 | 77.78 | 72.73 | 0.4569 | 0.7798 |
Comparison of datasets for three predictors.
| BBPpred | B3Pred | BBPpredict | |
|---|---|---|---|
| Data source | Positive: Brainpeps, PepBank, articles, SATPdb | Positive: B3Pdb | Positive: Brainpeps, B3Pdb, BBPpred, B3Pred, articles |
| Negative: UniProt | Negative: UniProt | Negative: UniProt | |
| Article search deadline | 22 July 2020 | Nov. 2021 | |
| Article number | 7 | 271 | 300 |
| Positive sample number | 119 (training:100, testing: 19) | 269 (training:215, testing: 54) | 425 (training:326, testing: 99) |
| Negative sample number | 119 (training:100, testing: 19) | 2,690 (training: 2,152, testing:538) | 425 (training:326, testing: 99) |
| Peptide length | 5–50 | 6–30 | 5–50 |
The prediction performances of different predictors.
| Predictor | SN(%) | SP(%) | ACC(%) | MCC |
|---|---|---|---|---|
|
|
|
|
|
|
| BBPpred | 67.68 | 65.66 | 66.67 | 0.3334 |
| B3Pred | 70.71 | 64.65 | 67.68 | 0.3542 |
FIGURE 3Web interface of BBPpredict. (A) The query sequences and threshold of the probability value (tp) are required to be submitted in the input interface. (B) The result page returned from BBPpredict.
Performance of BBPpredict in the independent testing dataset when tp changes.
| tp | SN (%) | SP (%) | ACC (%) | MCC |
|---|---|---|---|---|
| 0.1 | 100 | 11.11 | 55.56 | 0.2425 |
| 0.2 | 98.99 | 29.29 | 64.14 | 0.3944 |
| 0.3 | 94.95 | 44.44 | 69.70 | 0.4564 |
| 0.4 | 86.87 | 64.65 | 75.76 | 0.5284 |
| 0.5 | 76.77 | 77.78 | 77.27 | 0.5455 |
| 0.6 | 58.59 | 82.83 | 70.71 | 0.4269 |
| 0.7 | 45.45 | 90.91 | 68.18 | 0.4082 |
| 0.8 | 36.36 | 96.97 | 66.67 | 0.4191 |
| 0.9 | 13.13 | 97.98 | 55.56 | 0.2100 |
| 0.95 | 5.05 | 97.98 | 51.51 | 0.0820 |