| Literature DB >> 31664895 |
Haicang Zhang1,2, Qi Zhang1,2, Fusong Ju1,2, Jianwei Zhu1,2, Yujuan Gao3, Ziwei Xie4, Minghua Deng3, Shiwei Sun5, Wei-Mou Zheng6, Dongbo Bu7,8.
Abstract
BACKGROUND: Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge.Entities:
Keywords: Composite likelihood maximization; Deep learning; Markov random fields; Residue-residue contacts prediction
Mesh:
Substances:
Year: 2019 PMID: 31664895 PMCID: PMC6821021 DOI: 10.1186/s12859-019-3051-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Contact prediction accuracy on PSICOV benchmark
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
|
|
| |||||||
| PSICOV | 0.77 | 0.72 | 0.58 | 0.44 | 0.72 | 0.64 | 0.47 | 0.34 |
| mfDCA | 0.73 | 0.67 | 0.57 | 0.44 | 0.71 | 0.64 | 0.49 | 0.36 |
| plmDCA | 0.81 | 0.77 | 0.66 | 0.51 | 0.78 | 0.71 | 0.56 | 0.40 |
| clmDCA | 0.83 | 0.80 | 0.70 | 0.55 | 0.81 | 0.75 | 0.61 | 0.45 |
| plmDCA+DL | 0.92 | 0.90 | 0.85 | 0.75 | 0.89 | 0.86 | 0.74 | 0.59 |
| clmDCA+DL | 0.94 | 0.92 | 0.86 | 0.77 | 0.91 | 0.86 | 0.76 | 0.61 |
Fig. 1Predicted contacts (top L/5; sequence separation >6 AA) for protein structure with PDB ID: 1ne2A by plmDCA and clmDCA. Red (green) dots indicate correct (incorrect) prediction, while grey dots indicate all true residue-residue contacts. a The comparison between clmDCA (in upper-left triangle) and plmDCA (in lower-right triangle). b The comparison between clmDCA (in upper-left triangle) and clmDCA after refining using deep residual network (in lower-right triangle)
Contact prediction accuracy on CASP-11 targets
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
|
|
| |||||||
| PSICOV | 0.54 | 0.48 | 0.39 | 0.31 | 0.49 | 0.43 | 0.33 | 0.24 |
| mfDCA | 0.49 | 0.44 | 0.37 | 0.30 | 0.48 | 0.42 | 0.33 | 0.25 |
| plmDCA | 0.54 | 0.49 | 0.41 | 0.33 | 0.51 | 0.45 | 0.36 | 0.26 |
| clmDCA | 0.57 | 0.53 | 0.44 | 0.36 | 0.53 | 0.49 | 0.38 | 0.29 |
| plmDCA + DL | 0.77 | 0.71 | 0.60 | 0.48 | 0.50 | 0.46 | 0.38 | 0.29 |
| clmDCA + DL | 0.86 | 0.81 | 0.72 | 0.60 | 0.69 | 0.64 | 0.52 | 0.40 |
Fig. 2The relationship between the prediction accuracy and quality of MSA. Here the quality of MSA is measured using N, i.e. the number of effective homologous sequences. Dataset: PSICOV. Sequence separation: >6 AA
Fig. 3Native structure and predicted structures for protein structure with PDB ID: 1vmbA. a Native structure. b Structure built using contacts predicted by plmDCA (TMscore: 0.42). c Structure built using contacts predicted by clmDCA alone (TMscore: 0.55). d Structure built using contacts predicted by clmDCA together with deep learning for refinement (TMscore: 0.72)
Fig. 4Procedure of clmDCA to predict inter-residue contacts. a For a query protein (1wlg_A as an example), we identified its homologues by running HHblits [59] against nr90 sequence database (parameter setting: j:3,id:90,cov:70) and constructed multiple sequence alignment of these proteins. b The correlation among residues in MSA was disentangled using composite likelihood maximization technique, generating prediction of inter-residue contacts. c The predicted contacts were fed into a deep neural network for refinement. d The refined prediction of inter-residue contacts
Fig. 5Comparison of prediction accuracy of top L/2 contacts reported by plmDCA(y-axis) and clmDCA(x-axis) with two sequence separation threshold on the PSICOV dataset. a Sequence separation >6 AA. b Sequence separation >23 AA