| Literature DB >> 27818677 |
Tianchuan Du1, Li Liao1, Cathy H Wu1.
Abstract
Identifying the residues in a protein that are involved in protein-protein interaction and identifying the contact matrix for a pair of interacting proteins are two computational tasks at different levels of an in-depth analysis of protein-protein interaction. Various methods for solving these two problems have been reported in the literature. However, the interacting residue prediction and contact matrix prediction were handled by and large independently in those existing methods, though intuitively good prediction of interacting residues will help with predicting the contact matrix. In this work, we developed a novel protein interacting residue prediction system, contact matrix-interaction profile hidden Markov model (CM-ipHMM), with the integration of contact matrix prediction and the ipHMM interaction residue prediction. We propose to leverage what is learned from the contact matrix prediction and utilize the predicted contact matrix as "feedback" to enhance the interaction residue prediction. The CM-ipHMM model showed significant improvement over the previous method that uses the ipHMM for predicting interaction residues only. It indicates that the downstream contact matrix prediction could help the interaction site prediction.Entities:
Keywords: Contact matrix prediction; Interaction site prediction; Machine learning; Protein-protein interaction
Year: 2016 PMID: 27818677 PMCID: PMC5075339 DOI: 10.1186/s13637-016-0051-z
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Fig. 1The architecture of the interaction profile hidden Markov model. The match states of the classical pHMM are split into non-interacting (M ) and interacting (M ) match states. Image credit for Friedrich et al. [23]
Fig. 2Integrated machine learning classifier with contact matrix prediction and ipHMM prediction. The green column and the blue row are the ipHMM site prediction result for sequence A and sequence B. The binary matrix is the predicted contact matrix between sequence A and sequence B
The selected amino acid properties from AAindex database
| Property id | Property description |
|---|---|
| ANDN920101 | Alpha-CH chemical shifts |
| ARGP820101 | Hydrophobicity index |
| BEGF750101 | Conformational parameter of inner helix |
| BUNA790103 | Spin-spin coupling constants 3Jhalpha-NH |
| BHAR880101 | Average flexibility indices |
| BURA740102 | Normalized frequency of extended structure |
| GEOR030101 | Linker propensity from all dataset |
| CHOP780204 | Normalized frequency of N-terminal helix |
| CHOP780215 | Frequency of the 4th residue in turn |
| JOND920102 | Relative mutability |
| KHAG800101 | The Kerr-constant increments |
| FAUJ880104 | STERIMOL length of the side chain |
| PALJ810107 | Normalized frequency of alpha-helix in all-alpha class |
| RACS820114 | Value of theta(i-1) |
| WERD780103 | Free energy change of alpha(Ri) to alpha(Rh) |
| YUTK870102 | Unfolding Gibbs energy in water pH9.0 |
| CHAM830102 | A parameter defined from the residuals obtained from the best correlation of the Chou-Fasman parameter of beta-sheet |
Interaction site prediction performance of different models
| Avg. accuracy (%) | Avg. F1 (%) | Avg. MCC (%) | Avg. precision (%) | Avg. recall (%) | |
|---|---|---|---|---|---|
| ipHMM | 94.93 | 75.61 | 73.69 | 77.56 | 76.51 |
| CM-ipHMM | 96.97 | 90.05 | 89.11 | 85.98 | 96.83 |
| CM-only | 96.30 | 88.52 | 87.23 | 85.22 | 94.91 |
| Ground-truth-CM | 99.83 | 99.51 | 99.40 | 99.89 | 99.21 |
ipHMM the interaction profile hidden Markov model used to prediction interaction site, CM-ipHMM the logistic regression model built with the integration of contact matrix prediction and ipHMM interaction site prediction, CM-only the logistic regression model built with the predicted contact matrix prediction only, Ground-truth-CM the logistic regression model built with the ground-truth contact matrix prediction and ipHMM interaction site prediction
Fig. 3Interaction site prediction performance comparison between the integrated CM-ipHMM model and the ipHMM model. CM-ipHMM (red) is the logistic regression model built with the integration of contact matrix prediction and ipHMM interaction site prediction; ipHMM (blue) is the interaction profile hidden Markov model used to prediction interaction site