| Literature DB >> 27112932 |
Yu-An Huang1, Zhu-Hong You2, Xing Chen3, Keith Chan4, Xin Luo4.
Abstract
BACKGROUND: Proteins are the important molecules which participate in virtually every aspect of cellular function within an organism in pairs. Although high-throughput technologies have generated considerable protein-protein interactions (PPIs) data for various species, the processes of experimental methods are both time-consuming and expensive. In addition, they are usually associated with high rates of both false positive and false negative results. Accordingly, a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. Therefore, it is very urgent to develop effective computational methods for prediction of PPIs solely using protein sequence information.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27112932 PMCID: PMC4845433 DOI: 10.1186/s12859-016-1035-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison among different L parameter values on Yeast dataset
| L | Dimension | Acc. (%) | Prec. (%) | Sen. (%) | MCC (%) |
|---|---|---|---|---|---|
| 4 | 120 | 96.09 ± 0.33 | 100.00 ± 0.00 | 92.18 ± 0.72 | 92.47 ± 0.62 |
| 5 | 150 | 96.82 ± 0.43 | 100.00 ± 0.00 | 93.63 ± 0.87 | 93.83 ± 0.81 |
| 6 | 180 | 96.66 ± 0.30 | 100.00 ± 0.00 | 93.32 ± 0.56 | 93.52 ± 0.56 |
| 8 | 240 | 96.39 ± 0.16 | 100.00 ± 0.00 | 92.78 ± 0.20 | 93.02 ± 0.28 |
| 12 | 360 | 96.28 ± 0.43 | 100.00 ± 0.00 | 92.57 ± 0.81 | 92.82 ± 0.80 |
| 16 | 480 | 96.16 ± 0.51 | 100.00 ± 0.00 | 92.32 ± 1.00 | 92.59 ± 0.95 |
5-fold cross validation result obtained in predicting Yeast PPIs dataset
| Test set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 96.20 | 100.00 | 92.34 | 92.66 | 96.62 |
| 2 | 97.23 | 100.00 | 94.32 | 94.59 | 97.11 |
| 3 | 96.74 | 100.00 | 93.55 | 93.68 | 96.67 |
| 4 | 96.69 | 100.00 | 93.40 | 93.59 | 96.83 |
| 5 | 97.23 | 100.00 | 94.56 | 94.61 | 97.15 |
| Average | 96.82 ± 0.43 | 100.00 + 0.00 | 93.63 ± 0.87 | 93.83 ± 0.81 | 96.88 ± 0.24 |
5-fold cross validation result obtained in predicting H. pylori PPIs dataset
| Test set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 93.14 | 97.05 | 89.15 | 87.19 | 94.64 |
| 2 | 92.80 | 95.73 | 89.97 | 86.62 | 93.60 |
| 3 | 92.28 | 97.34 | 87.07 | 85.69 | 93.14 |
| 4 | 93.31 | 93.24 | 92.91 | 87.50 | 94.49 |
| 5 | 92.64 | 97.30 | 87.50 | 86.27 | 92.89 |
| Average | 92.83 ± 0.41 | 96.13 ± 1.75 | 89.32 ± 2.33 | 86.65 ± 0.72 | 93.75 ± 0.79 |
Fig. 1ROC from proposed method result for Yeast PPIs dataset
Fig. 2ROC from proposed method result for H. pylori PPIs dataset
5-fold cross validation result obtained in predicting Human PPIs dataset
| Classification model | Testing set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| Proposed method | 1 | 98.22 | 100.00 | 96.30 | 96.50 | 98.03 |
| 2 | 97.73 | 99.73 | 95.47 | 95.54 | 98.17 | |
| 3 | 97.55 | 99.87 | 95.04 | 95.20 | 97.94 | |
| 4 | 97.30 | 99.73 | 94.61 | 94.72 | 97.30 | |
| 5 | 97.49 | 99.73 | 94.97 | 95.08 | 97.57 | |
| Average | 97.66 ± 0.35 | 99.81 ± 0.12 | 95.28 ± 0.65 | 95.41 ± 0.68 | 97.80 ± 0.36 | |
| SVM | 1 | 91.79 | 96.70 | 85.84 | 84.75 | 96.43 |
| 2 | 91.97 | 97.63 | 85.12 | 84.99 | 95.30 | |
| 3 | 90.63 | 96.21 | 83.86 | 82.78 | 95.90 | |
| 4 | 91.97 | 97.51 | 85.37 | 85.02 | 96.55 | |
| 5 | 91.73 | 97.20 | 85.05 | 84.60 | 96.44 | |
| Average | 91.62 ± 0.57 | 97.05 ± 0.59 | 85.05 ± 0.73 | 84.43 ± 0.94 | 96.12 ± 0.52 |
Fig. 3ROC from proposed method result for Human PPIs dataset
Fig. 4ROC from SVM-based method result for Human PPIs dataset
Experimental results yielded by combing 2-MER and WSRC on H. pylori dataset
| Classification model | Testing set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|---|
| 2-MER with WSRC | 1 | 82.85 | 82.32 | 85.05 | 71.50 | 88.53 |
| 2 | 86.79 | 86.88 | 85.96 | 77.06 | 89.32 | |
| 3 | 85.25 | 81.67 | 89.75 | 74.78 | 90.32 | |
| 4 | 86.11 | 86.69 | 88.05 | 75.83 | 90.13 | |
| 5 | 83.39 | 78.62 | 88.19 | 72.20 | 89.74 | |
| Average | 84.88 ± 1.71 | 83.23 ± 3.53 | 87.40 ± 1.88 | 74.27 ± 2.37 | 89.61 ± 071 | |
| Proposed model | Average | 92.83 ± 0.41 | 96.13 ± 1.75 | 89.32 ± 2.33 | 86.65 ± 0.72 | 93.75 ± 0.79 |
Fig. 5ROC yielded by combining 2-MER and WSRC
Prediction results on five species based on our model
| Species | Test pairs | Accuracy |
|---|---|---|
|
| 21975 | 89.35 % |
|
| 6954 | 72.92 % |
|
| 4013 | 88.99 % |
|
| 1412 | 88.81 % |
|
| 1420 | 85.77 % |
|
| 313 | 83.39 % |
Performance comparison of different methods on the Yeast dataset
| Model | Test set | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) |
|---|---|---|---|---|---|
| Guos’ work [ | ACC | 89.33 ± 2.67 | 88.87 ± 6.16 | 89.93 ± 3.68 | N/A |
| AC | 87.36 ± 1.38 | 87.82 ± 4.33 | 87.30 ± 4.68 | N/A | |
| Zhous’ work [ | SVM + LD | 88.56 ± 0.33 | 89.50 ± 0.60 | 87.37 ± 0.22 | 77.15 ± 0.68 |
| Yangs’ work [ | Cod1 | 75.08 ± 1.13 | 74.75 ± 1.23 | 75.81 ± 1.20 | N/A |
| Cod2 | 80.04 ± 1.06 | 82.17 ± 1.35 | 76.77 ± 0.69 | N/A | |
| Cod3 | 80.41 ± 0.47 | 81.86 ± 0.99 | 78.14 ± 0.90 | N/A | |
| Cod4 | 86.15 ± 1.17 | 90.24 ± 1.34 | 81.03 ± 1.74 | N/A | |
| Proposed method | WSRC | 96.82 ± 0.43 | 100.00 + 0.00 | 93.63 ± 0.87 | 93.83 ± 0.81 |
Performance comparison of different methods on the H. pylori dataset
| Model | Accu.(%) | Prec.(%) | Sen.(%) | MCC(%) |
|---|---|---|---|---|
| Phylogenetic booststrap [ | 75.80 | 80.20 | 69.80 | N/A |
| HKNN [ | 84.00 | 84.00 | 86.00 | N/A |
| Signature products [ | 83.40 | 85.70 | 79.90 | N/A |
| Ensemble of HKNN [ | 86.60 | 85.00 | 86.70 | N/A |
| Boosting [ | 79.52 | 81.69 | 80.37 | 70.64 |
| Proposed method | 92.83 | 96.13 | 89.32 | 86.65 |
Amino acid classification
| Amino acid classification | |
|---|---|
| Aliphatic amino acid: | C1 = {A,V,L,I,M,C} |
| Aromatic amino acid: | C2 = {FW,Y} |
| Polar amino acid: | C3 = {S,TN,Q} |
| Positive amino acid: | C4 = {K,R} |
| Negative amino acid: | C5 = {D,E} |
| Special conformations: | C6 = {G,P} |
Example for the process of descriptors’ extraction
| Subsequence: | 1 0 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 |
| Position of ‘0’: | 0 0 0 0 0 0 0 0 0 0 0 0 |
| Position of ‘1’: | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 |
| ‘1-0’ transition: | 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 |
| ‘0-1’ transition: | 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 |
Example for characteristic sequence partition
| Sequence: | Length | |
|---|---|---|
| Sn: | 101001111001101010101011001101011010010110110101000100010 | 57 |
| SubS1: | 101001111 | 9 |
| SubS2: | 1010011110011010101 | 19 |
| SubS3: | 1010011110011010101010110011 | 28 |
| SubS4: | 10100111100110101010101100110101101001 | 38 |
| SubS5: | 10100111100110101010101100110101101001011011010 | 47 |
| SubS6: | 101001111001101010101011001101011010010110110101000100010 | 57 |