| Literature DB >> 27610387 |
Bifang He1, Juanjuan Kang1, Beibei Ru1, Hui Ding2, Peng Zhou2, Jian Huang2.
Abstract
Streptavidin is sometimes used as the intended target to screen phage-displayed combinatorial peptide libraries for streptavidin-binding peptides (SBPs). More often in the biopanning system, however, streptavidin is just a commonly used anchoring molecule that can efficiently capture the biotinylated target. In this case, SBPs creeping into the biopanning results are not desired binders but target-unrelated peptides (TUP). Taking them as intended binders may mislead subsequent studies. Therefore, it is important to find if a peptide is likely to be an SBP when streptavidin is either the intended target or just the anchoring molecule. In this paper, we describe an SVM-based ensemble predictor called SABinder. It is the first predictor for SBP. The model was built with the feature of optimized dipeptide composition. It was observed that 89.20% (MCC = 0.78; AUC = 0.93; permutation test, p < 0.001) of peptides were correctly classified. As a web server, SABinder is freely accessible. The tool provides a highly efficient way to exclude potential SBP when they are TUP or to facilitate identification of possibly new SBP when they are the desired binders. In either case, it will be helpful and can benefit related scientific community.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27610387 PMCID: PMC5005764 DOI: 10.1155/2016/9175143
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Flowchart of datasets construction. Training dataset and two independent testing datasets were constructed according to the above flowchart.
Number of positive and negative peptides in each dataset.
| Dataset | Number of positive peptides | Number of negative peptides | Length distribution (mean ± std) |
|---|---|---|---|
| Training dataset | 199 | 1990 | 9 ± 3.49 |
| NDFT dataset | 0 | 13272 | 9 ± 3.24 |
| SAART dataset | — | — | 10 ± 4.18 |
SAART dataset: the numbers of positive peptides and negative peptides are not determined.
Performances of SVM-based models trained with different features.
| Feature | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Amino acid composition (AAC) | 79.35 ± 1.96 | 78.79 ± 2.65 | 79.07 ± 1.75 | 0.58 ± 0.04 |
| Optimized amino acid composition (OAAC) | 78.14 ± 3.9 | 82.31 ± 4.45 | 80.23 ± 1.42 | 0.61 ± 0.03 |
| Dipeptide composition (DPC) | 79.14 ± 3.50 | 91.26 ± 1.92 | 85.20 ± 1.40 | 0.71 ± 0.03 |
| Optimized dipeptide composition (ODPC) |
|
|
|
|
Std: standard deviation.
The prediction performances of various machine learning methods.
| Machine learning methods | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Support vector machine |
|
|
|
|
| Naïve Bayes | 78.85 ± 3.90 | 77.40 ± 1.73 | 78.11 ± 2.47 | 0.56 ± 0.05 |
| Random Forest | 84.80 ± 2.30 | 88.00 ± 5.22 | 86.41 ± 2.41 | 0.73 ± 0.05 |
| Decision Tree J48 | 76.90 ± 1.10 | 88.24 ± 4.31 | 82.57 ± 2.00 | 0.66 ± 0.04 |
| RBF network | 79.00 ± 4.18 | 78.50 ± 2.33 | 78.74 ± 2.57 | 0.58 ± 0.05 |
| Logistic Function | 76.40 ± 3.22 | 67.83 ± 3.82 | 72.11 ± 3.23 | 0.44 ± 0.06 |
Std: standard deviation.
Figure 2ROC curves for model tuning and five permutations. AUCMT and AUCP represent AUC for model tuning and average AUC for five permutations, respectively. For all 10 submodels, the AUC of model tuning is much higher than the permutated ones, which shows an excellent prediction. For visualization, only five ROC curves for five out of 1000 permutations were plotted.