| Literature DB >> 29745828 |
Yujuan Gao1,2, Sheng Wang2, Minghua Deng3,4,5, Jinbo Xu6.
Abstract
BACKGROUND: Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging.Entities:
Keywords: Clustering; Deep learning; Dihedral angle prediction; Protein structure prediction; Residual network
Mesh:
Substances:
Year: 2018 PMID: 29745828 PMCID: PMC5998898 DOI: 10.1186/s12859-018-2065-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Illustration of the ResNet model in RaptorX-Angle. a A building block of ResNet with x and x being input and output, respectively. b The ResNet model architecture as a classifier with stacked residual blocks and a logistic regression layer. Here L is the sequence length of the protein or total number of residues under prediction and K is the number of clusters
Fig. 2Selecting proper number of clusters K. Left: relationship between entropy loss of discrete label probabilities and number of clusters K; Right: relationship between loglikelihood of mixture bivariate von Mises distribution and number of clusters K
The mean absolute error of different feature combinations with ResNet 5-100-3 on TS1267
| Feature combination | Phi | Psi | Phi_H | Psi_H | Phi_E | Psi_E | Phi_C | Psi_C |
|---|---|---|---|---|---|---|---|---|
| basic1=PSSM+aa | 19.97 | 31.97 | 9.82 | 17.57 | 20.70 | 26.97 | 29.84 | 49.66 |
| basic2=PSFM+aa | 20.02 | 31.78 | 9.86 | 17.68 | 20.46 | 26.38 | 30.10 | 49.39 |
| basic=PSSM+PSFM+aa | 19.27 | 30.04 | 9.11 | 15.58 | 19.70 | 24.64 | 29.35 | 48.02 |
| basic+ACC | 19.08 | 29.30 | 9.07 | 15.44 | 19.36 | 23.18 | 29.11 | 47.10 |
| basic+SS | 19.19 | 28.73 | 8.56 | 13.76 | 19.29 | 22.43 | 31.00 | 47.95 |
| basic+ACC+SS | 18.58 | 27.98 | 8.45 | 13.37 | 19.03 | 22.14 | 28.61 | 46.21 |
Phi and Psi denote MAE for all residues Phi_H and Psi_H denote MAE for residues in helix region Phi_E and Psi_E denote MAE for residues in beta strand region Phi_C and Psi_C denote MAE for residues in coil region
Pearson correlation coefficient of cosine values between predicted and true angles
| TS1267 | CASP11 | CASP12 | ||||
|---|---|---|---|---|---|---|
| cos( | cos( | cos( | ||||
| RaptorX-Angle | ||||||
| SPIDER2 | 0.6893/0.7427 | 0.6485/0.7095 | 0.6299/0.6761 | |||
| SPINE X | 0.6410/0.6543 | 0.5015/0.4891 | 0.4990/0.5039 | |||
| ANGLOR | 0.4775/0.6226 | 0.4437/0.5868 | 0.4431/0.5772 |
Mean absolute error of four methods for different secondary structural regions on three benchmarks: TS1267, 85 CASP11 targets and 40 CASP12 targets
| (°) | Phi | Psi | Phi_H | Psi_H | Phi_E | Psi_E | Phi_C | Psi_C |
|---|---|---|---|---|---|---|---|---|
| TS1267 | ||||||||
| RaptorX-Angle |
|
|
|
|
|
|
|
|
| SPIDER2 | 18.57 | 28.02 | 8.59 | 14.52 | 19.28 | 23.09 | 28.28 | 44.73 |
| SPINE X | 20.31 | 34.05 | 9.32 | 16.69 | 22.23 | 31.23 | 30.32 | 53.42 |
| ANGLOR | 24.01 | 43.59 | 9.29 | 26.41 | 27.47 | 40.88 | 36.89 | 62.72 |
| CASP11 | ||||||||
| RaptorX-Angle |
|
|
|
|
|
|
| 46.89 |
| SPIDER2 | 20.18 | 30.32 | 9.53 | 16.05 | 19.77 | 24.50 | 29.88 |
|
| SPINE X | 24.85 | 46.58 | 13.57 | 29.65 | 26.25 | 43.65 | 33.88 | 63.49 |
| ANGLOR | 25.69 | 46.17 | 9.99 | 27.72 | 28.08 | 43.85 | 37.96 | 64.03 |
| CASP12 | ||||||||
| RaptorX-Angle |
|
| 9.28 |
|
|
|
|
|
| SPIDER2 | 21.13 | 34.17 |
| 17.19 | 21.35 | 28.56 | 31.95 | 52.76 |
| SPINE X | 24.85 | 46.57 | 11.52 | 26.34 | 26.98 | 46.04 | 35.85 | 65.33 |
| ANGLOR | 25.79 | 47.37 | 9.69 | 28.81 | 29.11 | 44.79 | 38.65 | 65.74 |
Same notations with Table 1
Fig. 3Mean absolute error performance for different clusters in VL1267. Left: visualization of 20 cluster centers on the Ramachandran plot with smaller number indicating smaller size. Right: mean absolute error for different clusters
Mean absolute error performance for each amino acid type in VL1267
| Amino acids | Abundance | Frequency ( | ||
|---|---|---|---|---|
| A (Ala) | 22527 | 8.46 | 13.87 | 22.92 |
| C (Cys) | 3151 | 1.18 | 20.50 | 28.66 |
| D (Asp) | 15946 | 5.99 | 20.71 | 30.80 |
| E (Glu) | 18326 | 6.89 | 14.75 | 23.97 |
| F (Phe) | 10812 | 4.06 | 18.13 | 26.10 |
| G (Gly) | 19133 | 7.19 | 43.32 | 39.59 |
| H (His) | 5989 | 2.25 | 22.04 | 31.12 |
| I (Ile) | 15302 | 5.75 | 12.79 | 20.12 |
| K (Lys) | 15299 | 5.75 | 16.71 | 25.83 |
| L (Leu) | 24731 | 9.29 | 12.49 | 21.37 |
| M (Met) | 5833 | 2.19 | 16.71 | 24.86 |
| N (Asn) | 11383 | 4.28 | 27.38 | 32.04 |
| P (Pro) | 11977 | 4.50 | 8.84 | 33.00 |
| Q (Gln) | 10163 | 3.82 | 15.96 | 24.72 |
| R (Arg) | 13529 | 5.08 | 16.81 | 25.45 |
| S (Ser) | 15991 | 6.01 | 20.83 | 33.92 |
| T (Thr) | 14309 | 5.38 | 17.12 | 30.92 |
| V (Val) | 18612 | 6.99 | 13.70 | 20.94 |
| W (Trp) | 3854 | 1.45 | 18.05 | 27.61 |
| Y (Tyr) | 9287 | 3.49 | 18.83 | 27.02 |
| Total | 266154 | 100 | 18.32 | 27.15 |
Fig. 4Mean absolute error performance for different protein classes in VL1267. Left: for ϕ prediction. Right: for ψ prediction
Fig. 5Mean standard deviation for different secondary structural regions in TS1267
Fig. 6Relationship betwee n prediction error and standard deviation. Eight points are for two kinds of angles (ϕ,ψ) in four secondary structural regions (total, helix, strand, coil)