| Literature DB >> 35104320 |
Jaspreet Singh1, Thomas Litfin1, Jaswinder Singh1, Kuldip Paliwal1, Yaoqi Zhou2,3,4.
Abstract
MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks.Entities:
Year: 2022 PMID: 35104320 PMCID: PMC9113311 DOI: 10.1093/bioinformatics/btac053
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Overview of the model pipeline
A description of feature combinations for the ensemble of trained models
| Models | Features | Training strategy |
|---|---|---|
| Model1 | Attention map (last layer) | Direct inter-residue contact prediction |
| Model2 | Attention map (all layers) | Direct inter-residue contact prediction |
| Model3 | Attention map (all layers) + one-hot encoding + SPOT-1D-Single | Direct inter-residue contact prediction |
| Model4 | Attention map (last layer) | Inter-residue distance bin prediction |
| Model5 | Attention map (all layers) | Inter-residue distance bin prediction |
| Model6 | Attention map (all layers) + one-hot encoding + SPOT-1D-Single | Inter-residue distance bin prediction |
Comparison of model precision by using ResNet12 trained on different feature combinations for long-range contacts on the SPOT-2018 test set
| Model | Medium range contacts | Long range contacts | |||||||
|---|---|---|---|---|---|---|---|---|---|
| L/10 | L/5 | L/2 | L/1 | L/10 | L/5 | L/2 | L/1 | ||
| 1 | One-hot encoding | 20.79 | 17.39 | 13.04 | 9.98 | 7.02 | 6.15 | 5.04 | 4.40 |
| 2 | One-hot encoding + SPOT-1D-Single | 21.40 | 18.00 | 13.90 | 10.20 | 10.00 | 8.12 | 7.06 | 5.40 |
| 3 | ESM-1b attention map (last layer only) | 39.17 | 31.84 | 21.75 | 14.84 | 35.14 | 30.34 | 22.75 | 17.02 |
| 4 | ESM-1b attention map (all layers) | 40.03 | 32.83 | 22.49 | 15.26 | 36.03 | 30.92 | 23.75 | 18.13 |
| 5 | All features | 42.03 | 34.38 | 23.32 | 15.65 | 38.75 | 33.23 | 25.22 | 18.94 |
Precision comparison of two training strategies: direct contact prediction, and distogram contact prediction for medium-, and long-range contacts on the SPOT-2018 set
| Model | Medium range contacts | Long range contacts | ||||||
|---|---|---|---|---|---|---|---|---|
| L/10 | L/5 | L/2 | L/1 | L/10 | L/5 | L/2 | L/1 | |
| Direct Contact Prediction | 42.03 | 34.38 | 23.32 | 15.65 | 38.75 | 33.23 | 25.22 | 18.94 |
| Distogram Contact Prediction | 40.52 | 33.37 | 22.59 | 15.33 | 37.43 | 32.22 | 24.31 | 18.44 |
Comparison of individual model precision to the precision of the ensemble of models for long-range and medium-range contacts on the SPOT-2018 test set
| Model | Medium range contacts | Long range contacts | ||||||
|---|---|---|---|---|---|---|---|---|
| L/10 | L/5 | L/2 | L/1 | L/10 | L/5 | L/2 | L/1 | |
| Model1 | 39.17 | 31.84 | 21.75 | 14.84 | 35.14 | 30.34 | 22.75 | 17.02 |
| Model2 | 40.03 | 32.83 | 22.49 | 15.26 | 36.03 | 30.92 | 23.75 | 18.13 |
| Model3 | 42.03 | 34.38 | 23.32 | 15.65 | 38.75 | 33.23 | 25.22 | 18.94 |
| Model4 | 38.52 | 31.53 | 21.86 | 14.85 | 35.32 | 30.19 | 22.80 | 17.20 |
| Model5 | 40.16 | 32.85 | 22.13 | 14.93 | 37.34 | 31.82 | 23.87 | 17.77 |
| Model6 | 40.52 | 33.37 | 22.59 | 15.33 | 37.43 | 32.22 | 24.31 | 18.44 |
| Ensemble | 42.43 | 34.41 | 23.63 | 15.88 | 39.60 | 34.35 | 25.94 | 19.62 |
Fig. 2.Precision-based comparison of SPOT-Contact-LM, SPOT-Contact, trRosetta and ESM-1b on Neff1-2018 for short-, medium- and long-range contacts
Fig. 3.F1-score as a function of the number of effective homologous sequences (Neff) by SPOT-Contact-LM compared with other methods on SPOT-2018 for contact-map prediction
Fig. 4.Comparison of the predictions for 5YKZ_A protein by four methods as labeled. The upper triangle and lower triangle represent the native and the predicted contact-map, respectively
Precision-based comparison of SPOT-Contact-LM, SSCpred, ESM-1b and SPOT-Contact on the CASP14-FM set for medium and long range contacts
| Model | Medium range contacts | Long range contacts | ||||||
|---|---|---|---|---|---|---|---|---|
| L/10 | L/5 | L/2 | L/1 | L/10 | L/5 | L/2 | L/1 | |
| SPOT-Contact-LM | 29.73 | 24.72 | 17.96 | 13.93 | 18.92 | 19.38 | 15.40 | 11.56 |
| SSCpred | 26.13 | 24.50 | 17.17 | 12.61 | 9.91 | 9.13 | 7.66 | 7.69 |
| ESM-1b | 22.97 | 19.82 | 15.58 | 10.85 | 17.12 | 12.47 | 9.42 | 7.38 |
| SPOT-Contact (profile) | 41.44 | 36.08 | 26.41 | 17.09 | 25.23 | 21.16 | 19.28 | 16.21 |