| Literature DB >> 25215285 |
Zhu-Hong You1, Shuai Li2, Xin Gao3, Xin Luo2, Zhen Ji1.
Abstract
Protein-protein interactions are the basis of biological functions, and studying these interactions on a molecular level is of crucial importance for understanding the functionality of a living cell. During the past decade, biosensors have emerged as an important tool for the high-throughput identification of proteins and their interactions. However, the high-throughput experimental methods for identifying PPIs are both time-consuming and expensive. On the other hand, high-throughput PPI data are often associated with high false-positive and high false-negative rates. Targeting at these problems, we propose a method for PPI detection by integrating biosensor-based PPI data with a novel computational model. This method was developed based on the algorithm of extreme learning machine combined with a novel representation of protein sequence descriptor. When performed on the large-scale human protein interaction dataset, the proposed method achieved 84.8% prediction accuracy with 84.08% sensitivity at the specificity of 85.53%. We conducted more extensive experiments to compare the proposed method with the state-of-the-art techniques, support vector machine. The achieved results demonstrate that our approach is very promising for detecting new PPIs, and it can be a helpful supplement for biosensor-based PPI data detection.Entities:
Mesh:
Year: 2014 PMID: 25215285 PMCID: PMC4151593 DOI: 10.1155/2014/598129
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The schematic diagram for mapping large-scale protein-protein interactions by integrating biosensor data with ELM model.
Division of amino acids into seven groups based on the dipoles and volumes of the side chains.
| Group | Class | Dipole scale | Volume scale |
|---|---|---|---|
| 1 | Ala, Gly, Val | Dipole < 1.0 | Volume < 50 |
| 2 | Ile, Leu, Phe, Pro | Dipole < 1.0 | Volume > 50 |
| 3 | Tyr, Met, Thr, Ser | 1.0 < dipole < 2.0 | Volume > 50 |
| 4 | His, Asn, Gln, Trp | 2.0 < dipole < 3.0 | Volume > 50 |
| 5 | Arg, Lys | Dipole > 3.0 | Volume > 50 |
| 6 | Asp, Glu | Dipole > 3.0 | Volume > 50 |
| 7 | Cys | 1.0 < dipole < 2.0 | Volume > 50 |
Figure 2Sequence of a hypothetic protein indicating the construction of composition, transition, and distribution descriptors of a protein region.
Figure 3The structure of extreme learning machine.
Figure 4The relationship between the prediction accuracy and the number of hidden neurons. The x-axis denotes the number of hidden neurons as a percentage of sample number and the y-axis is the corresponding accuracy values.
Figure 5The relationship between the consuming time and the number of hidden neurons. The x-axis denotes the number of hidden neurons as a percentage of sample number and the y-axis is the running time.
Comparison of the prediction performance by the proposed method and state-of-the-art SVM classifier on the human dataset.
| Method | Kernel | Mean/std | Time (s) | ACC | SN | SP | PPV | NPV | F1 | MCC | AUC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Testing | |||||||||||
| ELM | Sigmoid | Mean | 72.7901 | 0.8480 | 0.8408 | 0.8553 | 0.8547 | 0.8415 | 0.8477 | 0.7422 | 0.9232 |
| Variance | 1.9062 | 0.0022 | 0.0019 | 0.0028 | 0.0040 | 0.0038 | 0.0029 | 0.0030 | 0.0028 | ||
| Hardlim | Mean | 77.4139 | 0.8206 | 0.8171 | 0.8242 | 0.8227 | 0.8185 | 0.8199 | 0.7056 | 0.9020 | |
| Variance | 3.7710 | 0.0050 | 0.0040 | 0.0063 | 0.0088 | 0.0026 | 0.0063 | 0.0064 | 0.0031 | ||
| Gaussian | Mean | 76.9615 | 0.7257 | 0.7328 | 0.7186 | 0.7232 | 0.7283 | 0.7279 | 0.6018 | 0.7624 | |
| Variance | 4.1012 | 0.0036 | 0.0048 | 0.0054 | 0.0085 | 0.0077 | 0.0044 | 0.0033 | 0.0017 | ||
|
| |||||||||||
| Training | |||||||||||
| ELM | Sigmoid | Mean | 1282.12 | 0.8887 | 0.8831 | 0.8944 | 0.8933 | 0.8843 | 0.8882 | 0.8022 | 0.9561 |
| Variance | 17.25 | 0.0006 | 0.0010 | 0.0018 | 0.0014 | 0.0001 | 0.0008 | 0.0010 | 0.0012 | ||
| Hardlim | Mean | 1330.33 | 0.8668 | 0.8655 | 0.8682 | 0.8683 | 0.8654 | 0.8669 | 0.7691 | 0.9397 | |
| Variance | 46.28 | 0.0027 | 0.0021 | 0.0033 | 0.0027 | 0.0027 | 0.0024 | 0.0039 | 0.0031 | ||
| Gaussian | Mean | 1435.45 | 0.7824 | 0.7896 | 0.7753 | 0.7790 | 0.7860 | 0.7843 | 0.6595 | 0.8626 | |
| Variance | 94.85 | 0.0033 | 0.0022 | 0.0053 | 0.0040 | 0.0026 | 0.0029 | 0.0037 | 0.0038 | ||
|
| |||||||||||
| Testing | |||||||||||
| SVM | Sigmoid | Mean | 2794.29 | 0.8177 | 0.8119 | 0.8232 | 0.8215 | 0.8144 | 0.8165 | 0.7018 | 0.8878 |
| Variance | 16.71 | 0.0127 | 0.0266 | 0.0128 | 0.0067 | 0.0200 | 0.0155 | 0.0160 | 0.0143 | ||
| Gaussian | Mean | 5237.89 | 0.6947 | 0.4714 | 0.9191 | 0.8535 | 0.6348 | 0.6064 | 0.5320 | 0.8997 | |
| Variance | 67.82 | 0.0228 | 0.0412 | 0.0112 | 0.0178 | 0.0265 | 0.0340 | 0.0276 | 0.0364 | ||
| Polynomial | Mean | 3612.98 | 0.8019 | 0.8219 | 0.7819 | 0.7903 | 0.8144 | 0.8057 | 0.6820 | 0.8838 | |
| Variance | 20.16 | 0.0101 | 0.0126 | 0.0117 | 0.0165 | 0.0114 | 0.0125 | 0.0122 | 0.0138 | ||
Figure 6The ROC (receiver operator characteristic) curve illustrating the performance of different activation functions. The curve presents the true positive rate (sensitivity) against the false positive rate (1 − specificity).