| Literature DB >> 31619921 |
Ji-Yong An1,2, Yong Zhou1,2, Yu-Jun Zhao1,2, Zi-Ji Yan1,2.
Abstract
BACKGROUND: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs.Entities:
Keywords: Protein-protein interactions; local coding; multifeatures fusion; position-specific scoring matrix; support vector machine
Year: 2019 PMID: 31619921 PMCID: PMC6777060 DOI: 10.1177/1176934319879920
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1.The flow diagram of local coding based on PSSM matrix. PSSM indicates position-specific scoring matrix.
Figure 2.The flow diagram of our feature extraction algorithm. BP indicates Bigram Probability; LAG, Local Average Group.
The abbreviations of different feature extraction methods.
| Feature extraction method | Abbreviation |
|---|---|
| Multifeatures fusion based on local coding PSSM matrix | LCPSSMMF |
| Multifeatures fusion based on original protein sequence PSSM matrix | PSSMMF |
| Local Average Group based on local coding PSSM matrix | LCPSSMLAG |
| Bigram Probabilities based on local coding PSSM matrix | LCPSSMBP |
Abbreviations: LCPSSMMF, local coding position-specific scoring matrix with multifeatures fusion; PSSM, position-specific scoring matrix; LAG, Local Average Group; BP, Bigram Probabilities.
The experimental results of the LCPSSMMF method on the yeast dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 94.05 | 94.14 | 93.89 | 88.82 |
| 2 | 93.07 | 91.93 | 93.74 | 87.09 |
| 3 | 93.02 | 93.02 | 94.19 | 87.95 |
| 4 | 92.40 | 91.26 | 93.42 | 85.95 |
| 5 | 93.17 | 92.19 | 94.26 | 87.27 |
| Average | 93.14 ± 0.60 | 92.50 ± 1.1 | 93.90 ± 0.34 | 87.41 ± 1.06 |
Abbreviations: LCPSSMMF, local coding position-specific scoring matrix with multifeatures fusion; MCC, Matthews correlation coefficient.
The experimental results of the PSSMMF method on the yeast dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 90.75 | 90.81 | 90.57 | 83.20 |
| 2 | 89.85 | 88.82 | 90.22 | 81.74 |
| 3 | 89.58 | 90.37 | 89.19 | 81.33 |
| 4 | 91.46 | 91.08 | 91.82 | 84.38 |
| 5 | 90.76 | 90.23 | 91.12 | 83.55 |
| Average | 90.48 ± 0.76 | 90.26 ± 0.87 | 90.58 ± 0.98 | 82.84 ± 1.27 |
Abbreviation: MCC, Matthews correlation coefficient; PSSMMF, position-specific scoring matrix with multifeatures fusion.
The experimental results of the LCPSSMBG method on the yeast dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 88.41 | 85.32 | 92.75 | 80.98 |
| 2 | 87.78 | 84.78 | 91.58 | 79.98 |
| 3 | 89.16 | 85.78 | 92.65 | 80.95 |
| 4 | 88.76 | 87.15 | 92.00 | 81.60 |
| 5 | 90.06 | 88.07 | 93.40 | 83.21 |
| Average | 88.83 ± 0.85 | 86.22 ± 1.36 | 92.47 ± 0.71 | 81.34 ± 1.19 |
Abbreviation: MCC, Matthews correlation coefficient.
The experimental results of the LCPSSMMF method on the human dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 89.42 | 90.96 | 88.11 | 81.07 |
| 2 | 90.89 | 92.74 | 89.82 | 83.41 |
| 3 | 90.25 | 93.94 | 88.06 | 82.29 |
| 4 | 89.87 | 94.16 | 86.15 | 81.75 |
| 5 | 91.61 | 95.94 | 88.00 | 84.60 |
| Average | 90.40 ± 0.86 | 93.54 ± 1.84 | 88.03 ± 1.30 | 82.62 ± 1.40 |
Abbreviation: LCPSSMMF, local coding position-specific scoring matrix with multifeatures fusion; MCC, Matthews correlation coefficient.
The experimental results of the PSSMMF method on the human dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 86.72 | 84.62 | 88.35 | 76.95 |
| 2 | 87.11 | 88.99 | 85.04 | 77.53 |
| 3 | 88.49 | 87.40 | 87.29 | 78.11 |
| 4 | 87.36 | 88.15 | 87.61 | 77.88 |
| 5 | 88.26 | 89.30 | 87.91 | 78.74 |
| Average | 87.58 ± 0.75 | 87.69 ± 1.90 | 87.24 ± 1.56 | 77.84 ± 0.67 |
Abbreviations: PSSMMF, position-specific scoring matrix with multifeatures fusion; MCC, Matthews correlation coefficient.
The experimental results of the LCPSSMBG method on the human dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 85.21 | 84.92 | 87.05 | 76.21 |
| 2 | 84.67 | 83.46 | 86.75 | 74.01 |
| 3 | 86.47 | 87.58 | 85.24 | 76.59 |
| 4 | 86.98 | 85.46 | 88.06 | 77.34 |
| 5 | 88.80 | 88.76 | 88.41 | 80.10 |
| Average | 86.42 ± 1.62 | 86.03 ± 2.21 | 87.10 ± 1.24 | 76.85 ± 2.20 |
Abbreviation: MCC, Matthews correlation coefficient.
Figure 3.Comparison of ROC curves based on different feature extraction methods and SVM classifier on the yeast dataset. ROC indicates receiver operating characteristic; SVM, support vector machine.
Figure 4.Comparison of ROC curves based on different feature extraction method and SVM classifier on the human dataset. LCPSSMMF indicates local coding position-specific scoring matrix (PSSM) with multifeatures fusion; ROC, receiver operating characteristic; SVM, support vector machine.
The prediction results of different feature extraction methods on the yeast dataset.
| Methods | Accuracy (%) | Sensitivity (%) | Precision (%) |
|---|---|---|---|
| LCPSSMMF | 93.14 | 92.50 | 93.90 |
| AC[ | 87.36 | 87.30 | 87.82 |
| ACC[ | 89.33 | 89.93 | 88.87 |
| GE[ | 91.73 | 85.05 | 97.05 |
| LD[ | 88.56 | 87.37 | 89.50 |
Abbreviations: AC, auto covariance; LCPSSMMF, local coding position-specific scoring matrix with multifeatures fusion; LD, local descriptors; ACC, auto cross covariance; GE, global encoding.
The experimental results of the LCPSSMAB method on the yeast dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 86.14 | 86.50 | 86.35 | 76.11 |
| 2 | 86.59 | 85.88 | 87.57 | 76.77 |
| 3 | 88.29 | 87.05 | 88.98 | 79.31 |
| 4 | 86.46 | 86.94 | 86.09 | 76.58 |
| 5 | 87.98 | 86.52 | 86.67 | 77.88 |
| Average | 87.09 ± 0.97 | 86.57 ± 0.46 | 87.13 ± 1.17 | 77.33 ± 1.28 |
Abbreviation: MCC, Matthews correlation coefficient.
The experimental results of the LCPSSMAB method on the human dataset.
| Testing times | Accuracy (%) | Sensitivity (%) | Precision (%) | MCC (%) |
|---|---|---|---|---|
| 1 | 85.63 | 84.52 | 86.30 | 75.38 |
| 2 | 85.18 | 83.01 | 86.69 | 74.73 |
| 3 | 85.31 | 85.37 | 85.48 | 74.94 |
| 4 | 86.91 | 85.53 | 87.37 | 77.22 |
| 5 | 86.30 | 86.02 | 87.10 | 76.34 |
| Average | 85.86 ± 0.72 | 84.89 ± 1.18 | 86.59 ± 0.75 | 75.72 ± 1.04 |
Abbreviation: MCC, Matthews correlation coefficient.