| Literature DB >> 26697220 |
Haijiang Geng1, Tao Lu2, Xiao Lin1, Yu Liu3, Fangrong Yan2.
Abstract
Protein functions through interactions with other proteins and biomolecules and these interactions occur on the so-called interface residues of the protein sequences. Identifying interface residues makes us better understand the biological mechanism of protein interaction. Meanwhile, information about the interface residues contributes to the understanding of metabolic, signal transduction networks and indicates directions in drug designing. In recent years, researchers have focused on developing new computational methods for predicting protein interface residues. Here we creatively used a 181-dimension protein sequence feature vector as input to the Naive Bayes Classifier- (NBC-) based method to predict interaction sites in protein-protein complexes interaction. The prediction of interaction sites in protein interactions is regarded as an amino acid residue binary classification problem by applying NBC with protein sequence features. Independent test results suggested that Naive Bayes Classifier-based method with the protein sequence features as input vectors performed well.Entities:
Year: 2015 PMID: 26697220 PMCID: PMC4677168 DOI: 10.1155/2015/978193
Source DB: PubMed Journal: Biochem Res Int
Figure 1Schematic procedure outline of our study.
Figure 2The number of neighboring interface residues for each position aside from an interface residue in Dset186. Position 0 is an interface residue and negative position represents the N-terminal side of this target residue and positive position is the C-terminal.
Figure 3The number of neighboring interface residues for each position aside from an interface residue in Dtestset72.
The ratio of actual interface residue number to subsequence length in different windows with an interface residue on the central position in the training dataset.
| Window size | Ratio of actual interface residue number to subsequence length (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
| 3 | 33.07 | 46.84 | 20.07 | ||||||||
| 5 | 18.08 | 32.14 | 27.04 | 16.52 | 6.19 | ||||||
| 7 | 10.09 | 19.64 | 27.13 | 21.08 | 13.40 | 6.64 | 1.96 | ||||
| 9 | 6.57 | 11.93 | 20.89 | 21.72 | 17.68 | 11.93 | 6.07 | 2.46 | 0.66 | ||
| 11 | 5.33 | 10.00 | 16.88 | 18.89 | 17.50 | 13.80 | 8.65 | 5.12 | 2.23 | 1.01 | 0.21 |
The best LOOCV results of different window sizes for Dset186 among different threshold.
| Window size | Sensitivity (%) | Precision (%) | Specificity (%) | ACC (%) | MCC (%) |
| Threshold θ |
|---|---|---|---|---|---|---|---|
| 1 | 40.6 | 13.5 | 67.5 | 64.5 | 9.5 | 20.2 | −1 |
| 3 | 53.1 | 14.5 | 60.9 | 60.0 | 8.9 | 22.7 | −0.82 |
| 5 | 60.4 | 14.5 | 55.7 | 56.2 | 10.2 | 23.4 | −0.98 |
| 7 | 54.3 | 15.1 | 62.2 | 61.3 | 10.5 | 23.7 | −0.82 |
| 9 | 56.9 | 15.2 | 60.4 | 60.0 | 11.0 | 23.9 | −0.88 |
| 11 | 56.0 | 15.1 | 60.8 | 60.3 | 10.7 | 23.8 | −0.86 |
| 13 | 59.2 | 14.8 | 57.8 | 58.0 | 10.7 | 23.7 | −0.96 |
Figure 4Dot plot of sensitivity versus specificity when NBC is with no window size.
The best model performance of NBC, ISIS, SPPIDER, and PSIVER tested on Dtestset72.
| Method | Sensitivity (%) | Precision (%) | Specificity (%) | ACC (%) | MCC (%) |
|
|---|---|---|---|---|---|---|
| NBC | 48.3 | 16.1 | 62.1 | 60.3 | 7.7 | 24.2 |
| ISIS | 35.0 | 21.0 | 76.2 | 70.9 | 9.1 | 26.3 |
| SPPIDER | 45.4 | 20.4 | 64.7 | 61.7 | 8.1 | 24.6 |
| PSIVER | 46.5 | 25.0 | 69.3 | 66.1 | 13.5 | 27.8 |
The best performance of machine learning algorithms tested on Dtestset72.
| Method | Sensitivity (%) | Precision (%) | Specificity (%) | ACC (%) | MCC (%) |
|
|---|---|---|---|---|---|---|
| NBC | 48.3 | 16.1 | 62.1 | 60.3 | 7.7 | 24.2 |
| SVM | 0.61 | 44.4 | 99.8 | 86.9 | 4.0 | 11.9 |
| RF | 2.5 | 19.5 | 98.4 | 85.9 | 2.5 | 4.5 |
| L1RG | 6.1 | 26.6 | 97.5 | 85.5 | 7.0 | 9.9 |